Testers wanted: Testing DRAM reliability on BPi M2+ and NanoPI M1


Recommended Posts

Update: BPi M2+ is done but further results would still be interesting.

 

Now we need results for NanoPi M1. The most simple way is using the Armbian image as outlined in post #4 below. The only thing you would've to change is the following and can then run the lima-memtester binary as outlined below:

ln -sf /boot/bin/nanopim1.bin /boot/script.bin
echo nanopim1 > /etc/hostname
reboot

Dear BPi M2+ users. I just tested DRAM reliability with my BPi M2+ just to realize that this board doesn't run stable even with just 624 MHz clockspeed (currently testing 600 MHz for an additional hour or so).

 

In case you have a BPi M2+ it would really help if you could do the same. Everything is outlined here: http://linux-sunxi.org/Xunlong_Orange_Pi_Plus_2E#DRAM_clock_speed_limit

 

Just grab the referenced fel-boot-lima-memtester-on-orange-pi-h3-v3.tar.bz2 archive that now also contains stuff for BPi M2+ and then use the contained fel-boot-lima-memtester-on-banana-pi-m2plus script (I would also start with 624 and if that succeeds then check increasing DRAM clockspeed in 24 MHz steps). Please be aware that since SinoVoip saved a second led on BPi M2+ the red led will blink and you get no notification by a solid lighting 2nd led so you should let the test run at least for 1 hour.

 

It's important to connect a HDMI display and ensure that a spinning cube can be seen with gray background (if the background is glowing red then something is wrong). Some more information can be found here: https://linux-sunxi.org/Hardware_Reliability_Tests#Reliability

 

Please get back ASAP with results since Chen-Yu is currently preparing upstream u-boot support and DRAM timing is important!

Link to post
Share on other sites
Armbian is a community driven open source project. Do you like to contribute your code?

Here we go. You find a freshly built  Armbian 5.14 Xenial (16.04 LTS) desktop image here: Armbian_5.14_Bananapim2plus_Ubuntu_xenial_3.4.112_desktop.7z  (438M download size)

 

This can be burned on any SD card larger than 2 GB and starts with a DRAM clockspeed of 648 MHz (and we do not allow switching between different DRAM clockspeeds: "# CONFIG_DEVFREQ_DRAM_FREQ is not set"). Also a statically linked lima-memtester binary is included. To start with this please let RPi-Monitor install and then start the test in the following way (as root -- do a 'sudo su -' before if you're not already super user):

armbianmonitor -r
/usr/local/bin/lima-memtester 100M >/dev/null 2>&1

Since we disabled CONFIG_DEVFREQ_DRAM_FREQ RPi-Monitor won't be able to show actual DRAM frequency any more so we have to trust in settings.

 

IMPORTANT: The test is only useful when a connected HDMI display is on and shows a spinning cube on a gray background and this runs at least 1 hour. In case you see a glowing red background then something's already wrong and you have to switch DRAM frequency. So if it looks like this then the test FAILED:

 

 

To change DRAM clockspeed you need this archive here: u-boot-bananapim2plus_5.14_memtester.tar.bz2

 

The contents are as follows:

linux-u-boot-bananapim2plus_5.14_armhf_600MHz.deb
linux-u-boot-bananapim2plus_5.14_armhf_624MHz.deb
linux-u-boot-bananapim2plus_5.14_armhf_648MHz.deb
linux-u-boot-bananapim2plus_5.14_armhf_672MHz.deb
linux-u-boot-bananapim2plus_5.14_armhf_696MHz.deb
linux-u-boot-bananapim2plus_5.14_armhf_720MHz.deb
linux-u-boot-bananapim2plus_5.14_armhf_744MHz.deb

So to switch to eg. 624 MHz you would grab the archive, untar it using 'tar xf /path/to/u-boot-bananapim2plus_5.14_memtester.tar.bz2' and then do a 'dpkg -i linux-u-boot-bananapim2plus_5.14_armhf_624MHz.deb && sync && reboot'. And then start again using

/usr/local/bin/lima-memtester 100M >/dev/null 2>&1
Link to post
Share on other sites

Any volunteers? We're in an urgent need of further testers. The procedure outlined above should be simple enough, isn't it? Grab a 4 GB card, burn the image, start it, create the usual normal user, install RPi-Monitor (please see below) and then let the test run and get back to here with feedback. :)

 

BTW: Installation of RPi-Monitor would really help getting an idea whether H3 on my BPi M2+ is broken or whether heat dissipation of this board is broken in general. When I run this image with just a heatsink on H3 and without a fan then H3 will get clocked down to 312 MHz and also one CPU core will be killed. The same image running on an OPi PC Plus (after relinking script.bin) with the same heatsink in the same location only clocks down to 1008/1200 MHz.

 

Bildschirmfoto%202016-06-04%20um%2017.55

 

So it would really help if others can show their thermal measurements while executing the test as outlined above.

Link to post
Share on other sites

Is this normal?

 

Huh, it really seems BPi M2+ has a horrible 'thermal design', you experienced already 2 CPU cores being killed. When I adjusted the cooler_table entries after first real tests with BPi M2+ to such low values I would've never thought anyone will be able to reach this unless he uses really an 'enclosure from hell' without any airflow. But it seems we both manage to get CPU cores being killed at 240 MHz when running outside an enclosure and with heatsink applied (while H3 Oranges happily run with the same workload at +1000 MHz with 4 cores)

 

Anyway: I tested on the basis of boot0 using a SinoVoip OS image and were able to check DRAM with 720MHz clockspeed successfully. Then I did the same with our Armbian test image (using u-boot 2016.05) and could confirm: 720MHz work at least for an hour while 744MHz already gave a glowing red background. Now I replaced u-boot+spl on the Armbian image with the stuff from ssvb when he created his FEL boot based lima-memtester archive (full bootlog) and am currently testing 720MHz. Spinning cube after 15 minutes -- will let this run for an hour and start then FEL boot test (using 'our' u-boot 2016.05 then).

 

@Igor: Did you try out higher DRAM clockspeeds already or just the default 648 MHz I used when creating the image?

 

Maybe the different power scheme on the BPi M2+ is responsible for the worse results I got. BPi M2+ powers up when an USB cable is connected to the Micro USB port. Will have a look later when testing FEL mode again. Maybe it's just instable DC-IN when both a PSU and another host on the OTG port 'provide' power?

Link to post
Share on other sites

Now testing again FEL boot (the 'usual' lima-memtester approach) but using the most recent u-boot version the Armbian test image also uses. Since it failed the last time already at 624 MHz I tried it now with 672 MHz:

 

 

 

U-Boot SPL 2016.05-armbian (Jun 03 2016 - 16:46:24)
DRAM: 1024 MiB
Trying to boot from 


U-Boot 2016.05-armbian (Jun 03 2016 - 16:46:24 +0200) Allwinner Technology

CPU:   Allwinner H3 (SUN8I 1680)
Model: Xunlong Orange Pi PC
DRAM:  1 GiB
MMC:   ** First descriptor is NOT a primary desc on 0:1 **
SUNXI SD/MMC: 1 (SD), SUNXI SD/MMC: 0 (eMMC)
*** Warning - bad CRC, using default environment

In:    serial
Out:   serial
Err:   serial
Net:   No ethernet found.
starting USB...
USB0:   USB EHCI 1.00
USB1:   USB OHCI 1.0
USB2:   USB EHCI 1.00
USB3:   USB OHCI 1.0
USB4:   USB EHCI 1.00
USB5:   USB OHCI 1.0
scanning bus 0 for devices... 1 USB Device(s) found
scanning bus 2 for devices... 1 USB Device(s) found
scanning bus 4 for devices... 1 USB Device(s) found
Hit any key to stop autoboot:  0 
(FEL boot)
## Executing script at 43100000
## Booting kernel from Legacy Image at 42000000 ...
   Image Name:   Linux-3.4.39+
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    7045824 Bytes = 6.7 MiB
   Load Address: 40008000
   Entry Point:  40008000
   Verifying Checksum ... OK
   Loading Kernel Image ... OK
Using machid 0x1029 from environment

Starting kernel ...

[sun8i_fixup]: From boot, get meminfo:
        Start:  0x40000000
        Size:   1024MB
ion_carveout reserve: 160m@0 256m@0 130m@1 200m@1
ion_reserve_common: ion reserve: [0x70000000, 0x80000000]!
[    3.283597] failed to get normal led pin assign
[    3.283612] failed to get standby led pin assign
[    3.741186] sunxikbd_init failed. 
[    3.744969] ls_fetch_sysconfig_para: type err  device_used = 0. 
[    3.752828] tscdev_init: tsc driver is disabled
[    3.759757] [cpu_freq] ERR:get cpu extremity frequency from sysconfig failed, use max_freq
[    3.781832] no green_led, ignore it!
[    3.785795] no blue_led, ignore it!
[    3.792458] request gpio failed!
[    3.840830] ths_fetch_sysconfig_para: type err  device_used = 1. 
Starting logging: OK
Initializing random number generator... done.
Starting network...
This is a simple textured cube demo from the lima driver and
a memtester. Both combined in a single program. The mali400
hardware is only used to stress RAM in the background. But
this happens to significantly increase chances of exposing
memory stability related problems.

Kernel driver is version 14
Detected 1 Mali-400 GP Cores.
Detected 2 Mali-400 PP Cores.
FB: 1280x720@32bpp at 0x70200000 (0x00708000)
Using dual buffered direct rendering to FB.

memtester version 4.3.0 (32-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffff000
want 50MB (52428800 bytes)
got  50MB (52428800 bytes), trying mlock ...locked.
Loop 1/2:
  Stuck Address       : testing   0FAILURE: possible bad address line at offset 0x0000fbf0.
Skipping to next test...
  Random Value        : [    6.190727] Unable to handle kernel paging request at virtual address ae164bc8
[    6.198758] pgd = c0004000
[    6.200600] [ae164bc8] *pgd=00000000
[    6.200600] sunxi oops: enable sdcard JTAG interface
[    6.200600] sunxi oops: cpu frequency: 1200 MHz
[    6.200600] sunxi oops: ddr frequency: 672 MHz
[    6.200600] sunxi oops: gpu frequency: 252 MHz
[    6.200600] sunxi oops: cpu temperature: 53 
[    6.200600] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[    6.200600] Modules linked in:
[    6.200600] CPU: 0    Not tainted  (3.4.39+ #1)
[    6.200600] PC is at cpuacct_charge+0x54/0xc8
[    6.200600] LR is at 0xa6c000
[    6.200600] pc : [<c005caf8>]    lr : [<00a6c000>]    psr: a00b0093
[    6.200600] sp : ed4abd88  ip : ed4abd88  fp : ed4abda4
[    6.200600] r10: ed48aac0  r9 : 00000001  r8 : ed48aaf8
[    6.200600] r7 : 00000000  r6 : ed48aac0  r5 : 00000000  r4 : 00010ff9
[    6.200600] r3 : c0d45d08  r2 : c0cb8e40  r1 : 00000000  r0 : ed48aac0
[    6.200600] Flags: NzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[    6.200600] Control: 10c5387d  Table: 6e9a406a  DAC: 00000015
[    6.200600] 
[    6.200600] PC: 0xc005ca78:
[    6.200600] ca78  eb17e727 e1a00007 e24bd028 e89daff0 c0ce71b0 c0ce6d44 c0cba040 c0056e9c
[    6.200600] ca98  c0050ce4 c0d45d28 c0d46208 e1a0c00d e92dd8f0 e24cb004 e52de004 e8bd4000
[    6.200600] cab8  e1a05003 e59f3068 e1a06000 e1a04002 e5933370 e3530000 089da8f0 e5903004
[    6.200600] cad8  e5937014 eb011d02 e5963480 e59fe044 e5933020 ea00000a e79ec107 e5932010
[    6.200600] caf8  e18200dc e0900004 e0a11005 e18200fc e5933000 e5933018 e3530000 0a000002
[    6.200600] cb18  e5933024 e3530000 1afffff2 eb0121e6 e89da8f0 c0ce6d44 c0cdb4f8 e1a0c00d
[    6.200600] cb38  e92dd800 e24cb004 e59f301c e5932000 e59f3018 e2822c75 e2822030 e0832392
[    6.200600] cb58  e1a00002 e1a01003 e89da800 c0cba0c0 00989680 e1a0c00d e92ddbf0 e24cb004
[    6.200600] 
[    6.200600] SP: 0xed4abd08:
[    6.200600] bd08  ee969580 c0658ef4 ef0c8ac0 ef0c8ac0 ed4abd34 ed4abd28 c005caf8 a00b0093
[    6.200600] bd28  ffffffff ed4abd74 ed4abda4 ed4abd40 c000df58 c000836c ed48aac0 00000000
[    6.200600] bd48  c0cb8e40 c0d45d08 00010ff9 00000000 ed48aac0 00000000 ed48aaf8 00000001
[    6.200600] bd68  ed48aac0 ed4abda4 ed4abd88 ed4abd88 00a6c000 c005caf8 a00b0093 ffffffff
[    6.200600] bd88  00010ff9 00000000 0dbb2a93 00000000 ed4abde4 ed4abda8 c005e0d8 c005cab0
[    6.200600] bda8  ef0ce780 00000004 ed4abdec ed4abdc0 c005515c 0078e8a3 00000000 ed48aaf8
[    6.200600] bdc8  c1722750 70fe6fd0 00000001 c1722700 ed4abe8c ed4abde8 c0060058 c005df9c
[    6.200600] bde8  c005cb84 c00113c0 70fe4049 00000001 ef0ce600 ee9a3e1c 70fe4049 00000001
[    6.200600] 
[    6.200600] IP: 0xed4abd08:
[    6.200600] bd08  ee969580 c0658ef4 ef0c8ac0 ef0c8ac0 ed4abd34 ed4abd28 c005caf8 a00b0093
[    6.200600] bd28  ffffffff ed4abd74 ed4abda4 ed4abd40 c000df58 c000836c ed48aac0 00000000
[    6.200600] bd48  c0cb8e40 c0d45d08 00010ff9 00000000 ed48aac0 00000000 ed48aaf8 00000001
[    6.200600] bd68  ed48aac0 ed4abda4 ed4abd88 ed4abd88 00a6c000 c005caf8 a00b0093 ffffffff
[    6.200600] bd88  00010ff9 00000000 0dbb2a93 00000000 ed4abde4 ed4abda8 c005e0d8 c005cab0
[    6.200600] bda8  ef0ce780 00000004 ed4abdec ed4abdc0 c005515c 0078e8a3 00000000 ed48aaf8
[    6.200600] bdc8  c1722750 70fe6fd0 00000001 c1722700 ed4abe8c ed4abde8 c0060058 c005df9c
[    6.200600] bde8  c005cb84 c00113c0 70fe4049 00000001 ef0ce600 ee9a3e1c 70fe4049 00000001
[    6.200600] 
[    6.200600] FP: 0xed4abd24:
[    6.200600] bd24  a00b0093 ffffffff ed4abd74 ed4abda4 ed4abd40 c000df58 c000836c ed48aac0
[    6.200600] bd44  00000000 c0cb8e40 c0d45d08 00010ff9 00000000 ed48aac0 00000000 ed48aaf8
[    6.200600] bd64  00000001 ed48aac0 ed4abda4 ed4abd88 ed4abd88 00a6c000 c005caf8 a00b0093
[    6.200600] bd84  ffffffff 00010ff9 00000000 0dbb2a93 00000000 ed4abde4 ed4abda8 c005e0d8
[    6.200600] bda4  c005cab0 ef0ce780 00000004 ed4abdec ed4abdc0 c005515c 0078e8a3 00000000
[    6.200600] bdc4  ed48aaf8 c1722750 70fe6fd0 00000001 c1722700 ed4abe8c ed4abde8 c0060058
[    6.200600] bde4  c005df9c c005cb84 c00113c0 70fe4049 00000001 ef0ce600 ee9a3e1c 70fe4049
[    6.200600] be04  00000001 00000000 c1722700 c1722700 c0cdb4f8 c1722c80 00000089 ed4abe6c
[    6.200600] 
[    6.200600] R0: 0xed48aa40:
[    6.200600] aa40  ed48aa3c 00000000 00000000 00000000 00000000 00000020 00000000 0000c350
[    6.200600] aa60  0000c350 00000000 ffffffff 00000000 00000000 00000000 00000000 00000000
[    6.200600] aa80  00000000 00000000 00000001 00000000 ffffffff ffffffff ffffffff ffffffff
[    6.200600] aaa0  ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffff7f ffffffff
[    6.200600] aac0  00000001 ed4aa000 00000002 04208060 00000000 00000000 00000001 00000001
[    6.200600] aae0  00000078 00000078 00000078 00000000 c065e978 00000000 00000400 00400000
[    6.200600] ab00  00000001 00000000 00000000 c1722bdc c1722bdc 00000001 70fe6fd0 00000001
[    6.200600] ab20  0454b4c1 00000000 0dbb2a93 00000000 0453a4c8 00000000 00000267 00000000
[    6.200600] 
[    6.200600] R2: 0xc0cb8dc0:
[    6.200600] 8dc0  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    6.200600] 8de0  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    6.200600] 8e00  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    6.200600] 8e20  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    6.200600] 8e40  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    6.200600] 8e60  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    6.200600] 8e80  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    6.200600] 8ea0  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    6.200600] 
[    6.200600] R3: 0xc0d45c88:
[    6.200600] 5c88  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    6.200600] 5ca8  00000000 00000000 00000000 00000000 ef02d580 00989680 00000000 00000000
[    6.200600] 5cc8  00000000 00000000 00000000 ef019e40 00000000 00000000 c0cec2b4 ef0cf940
[    6.200600] 5ce8  ef000dc0 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    6.200600] 5d08  c0d57a80 00000001 00000001 00000000 c0cb8e40 c0cb48a8 00000000 00000000
[    6.200600] 5d28  c0d57a80 00000001 00000001 00000000 ef002740 ef002750 00000400 00000000
[    6.200600] 5d48  0000006f 00000000 00000077 00000077 ef002760 ef002770 00000000 00000000
[    6.200600] 5d68  3b9aca00 00000000 389fd980 00000000 00000001 c1728a40 c1728948 00000000
[    6.200600] 
[    6.200600] R6: 0xed48aa40:
[    6.200600] aa40  ed48aa3c 00000000 00000000 00000000 00000000 00000020 00000000 0000c350
[    6.200600] aa60  0000c350 00000000 ffffffff 00000000 00000000 00000000 00000000 00000000
[    6.200600] aa80  00000000 00000000 00000001 00000000 ffffffff ffffffff ffffffff ffffffff
[    6.200600] aaa0  ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffff7f ffffffff
[    6.200600] aac0  00000001 ed4aa000 00000002 04208060 00000000 00000000 00000001 00000001
[    6.200600] aae0  00000078 00000078 00000078 00000000 c065e978 00000000 00000400 00400000
[    6.200600] ab00  00000001 00000000 00000000 c1722bdc c1722bdc 00000001 70fe6fd0 00000001
[    6.200600] ab20  0454b4c1 00000000 0dbb2a93 00000000 0453a4c8 00000000 00000267 00000000
[    6.200600] 
[    6.200600] R8: 0xed48aa78:
[    6.200600] aa78  00000000 00000000 00000000 00000000 00000001 00000000 ffffffff ffffffff
[    6.200600] aa98  ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
[    6.200600] aab8  ffffff7f ffffffff 00000001 ed4aa000 00000002 04208060 00000000 00000000
[    6.200600] aad8  00000001 00000001 00000078 00000078 00000078 00000000 c065e978 00000000
[    6.200600] aaf8  00000400 00400000 00000001 00000000 00000000 c1722bdc c1722bdc 00000001
[    6.200600] ab18  70fe6fd0 00000001 0454b4c1 00000000 0dbb2a93 00000000 0453a4c8 00000000
[    6.200600] ab38  00000267 00000000 00000000 00000000 003a4fe7 00000000 000006a2 00000000
[    6.200600] ab58  0079c775 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    6.200600] 
[    6.200600] R10: 0xed48aa40:
[    6.200600] aa40  ed48aa3c 00000000 00000000 00000000 00000000 00000020 00000000 0000c350
[    6.200600] aa60  0000c350 00000000 ffffffff 00000000 00000000 00000000 00000000 00000000
[    6.200600] aa80  00000000 00000000 00000001 00000000 ffffffff ffffffff ffffffff ffffffff
[    6.200600] aaa0  ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffff7f ffffffff
[    6.200600] aac0  00000001 ed4aa000 00000002 04208060 00000000 00000000 00000001 00000001
[    6.200600] aae0  00000078 00000078 00000078 00000000 c065e978 00000000 00000400 00400000
[    6.200600] ab00  00000001 00000000 00000000 c1722bdc c1722bdc 00000001 70fe6fd0 00000001
[    6.200600] ab20  0454b4c1 00000000 0dbb2a93 00000000 0453a4c8 00000000 00000267 00000000
[    6.200600] Process kworker/u:2 (pid: 69, stack limit = 0xed4aa2f8)
[    6.200600] Stack: (0xed4abd88 to 0xed4ac000)
[    6.200600] bd80:                   00010ff9 00000000 0dbb2a93 00000000 ed4abde4 ed4abda8
[    6.200600] bda0: c005e0d8 c005cab0 ef0ce780 00000004 ed4abdec ed4abdc0 c005515c 0078e8a3
[    6.200600] bdc0: 00000000 ed48aaf8 c1722750 70fe6fd0 00000001 c1722700 ed4abe8c ed4abde8
[    6.200600] bde0: c0060058 c005df9c c005cb84 c00113c0 70fe4049 00000001 ef0ce600 ee9a3e1c
[    6.200600] be00: 70fe4049 00000001 00000000 c1722700 c1722700 c0cdb4f8 c1722c80 00000089
[    6.200600] be20: ed4abe6c ed4abe30 00000001 c005cb78 c004947c c005a9c4 00000000 c1722c80
[    6.200600] be40: ed4abe7c 00000000 c1722700 c1722700 c0cdb4f8 ed48ada0 00000089 ed48aac0
[    6.200600] be60: ed4abe8c 0078e8a3 00000000 00000000 00000000 70fe6fd0 00000001 ed48aac0
[    6.200600] be80: ed4abec4 ed4abe90 c0057298 c006002c 00000001 c1722700 ed4abed4 ed48aac0
[    6.200600] bea0: c1722700 ed4aa000 c0cdb4f8 ed48ada0 00000089 c0d457c0 ed4abed4 ed4abec8
[    6.200600] bec0: c00578b4 c00571d0 ed4abf74 ed4abed8 c0657ae4 c0057888 ed4abf0c ed4abee8
[    6.200600] bee0: c0372564 c0373a58 ef285500 00000003 c0cb6700 c0657d30 c0cb6700 00000000
[    6.200600] bf00: ed4abf24 ed4abf10 c0372b78 c037235c 00000000 c0658ef4 ef0cf4c0 ee8e3cd0
[    6.200600] bf20: ed4abf3c ed4abf30 c0658ef4 c0658e3c ed4abf4c c00442f0 ef0cf4c0 ee8e3cd0
[    6.200600] bf40: ed4abf84 ed4abf50 c00442f0 ef0cf4c0 c0d457c0 ed4aa000 ef0cf4d0 c0d457c0
[    6.200600] bf60: 00000089 c0d457c0 ed4abf84 ed4abf78 c0657d30 c0657450 ed4abfb4 ed4abf88
[    6.200600] bf80: c0044808 c0657cac 00000000 ee439edc ef0cf4c0 c0044578 00000013 00000000
[    6.200600] bfa0: 00000000 00000000 ed4abff4 ed4abfb8 c0048a88 c0044584 00000000 00000000
[    6.200600] bfc0: ef0cf4c0 00000000 00000000 00000000 ed4abfd0 ed4abfd0 00000000 ee439edc
[    6.200600] bfe0: c00489ec c000f66c 00000000 ed4abff8 c000f66c c00489f8 ffffffff ffffffff
[    6.200600] [<c005caf8>] (cpuacct_charge+0x54/0xc8) from [<c005e0d8>] (update_curr+0x148/0x1bc)
[    6.200600] [<c005e0d8>] (update_curr+0x148/0x1bc) from [<c0060058>] (dequeue_task_fair+0x38/0xd14)
[    6.200600] [<c0060058>] (dequeue_task_fair+0x38/0xd14) from [<c0057298>] (dequeue_task+0xd4/0xe4)
[    6.200600] [<c0057298>] (dequeue_task+0xd4/0xe4) from [<c00578b4>] (deactivate_task+0x38/0x3c)
[    6.200600] [<c00578b4>] (deactivate_task+0x38/0x3c) from [<c0657ae4>] (__schedule+0x6a0/0x74c)
[    6.200600] [<c0657ae4>] (__schedule+0x6a0/0x74c) from [<c0657d30>] (schedule+0x90/0x94)
[    6.200600] [<c0657d30>] (schedule+0x90/0x94) from [<c0044808>] (worker_thread+0x290/0x2d0)
[    6.200600] [<c0044808>] (worker_thread+0x290/0x2d0) from [<c0048a88>] (kthread+0x9c/0xac)
[    6.200600] [<c0048a88>] (kthread+0x9c/0xac) from [<c000f66c>] (kernel_thread_exit+0x0/0x8)
[    6.200600] Code: e5933020 ea00000a e79ec107 e5932010 (e18200dc) 
[   36.050009] ------------[ cut here ]------------
[   36.055137] WARNING: at kernel/watchdog.c:255 watchdog_timer_fn+0x10c/0x2e4()
[   36.060003] Watchdog detected hard LOCKUP on cpu 0
[   36.060003] Modules linked in:
[   36.060003] [<c0016de8>] (unwind_backtrace+0x0/0xec) from [<c064f090>] (dump_stack+0x20/0x24)
[   36.060003] [<c064f090>] (dump_stack+0x20/0x24) from [<c0027eb8>] (warn_slowpath_common+0x5c/0x74)
[   36.060003] [<c0027eb8>] (warn_slowpath_common+0x5c/0x74) from [<c0027f8c>] (warn_slowpath_fmt+0x40/0x48)
[   36.060003] [<c0027f8c>] (warn_slowpath_fmt+0x40/0x48) from [<c009d8b4>] (watchdog_timer_fn+0x10c/0x2e4)
[   36.060003] [<c009d8b4>] (watchdog_timer_fn+0x10c/0x2e4) from [<c004cd8c>] (__run_hrtimer+0x138/0x2a4)
[   36.060003] [<c004cd8c>] (__run_hrtimer+0x138/0x2a4) from [<c004da64>] (hrtimer_interrupt+0x130/0x298)
[   36.060003] [<c004da64>] (hrtimer_interrupt+0x130/0x298) from [<c0015330>] (arch_timer_handler+0x38/0x40)
[   36.060003] [<c0015330>] (arch_timer_handler+0x38/0x40) from [<c00a1868>] (handle_percpu_devid_irq+0xe0/0x1b4)
[   36.060003] [<c00a1868>] (handle_percpu_devid_irq+0xe0/0x1b4) from [<c009dec0>] (generic_handle_irq+0x30/0x40)
[   36.060003] [<c009dec0>] (generic_handle_irq+0x30/0x40) from [<c000f404>] (handle_IRQ+0x88/0xc8)
[   36.060003] [<c000f404>] (handle_IRQ+0x88/0xc8) from [<c0008540>] (gic_handle_irq+0x58/0x88)
[   36.060003] [<c0008540>] (gic_handle_irq+0x58/0x88) from [<c000dfc0>] (__irq_svc+0x40/0x70)
[   36.060003] Exception stack(0xef0edf68 to 0xef0edfb0)
[   36.060003] df60:                   c1738b38 00000000 0000000f 00000000 ef0ec000 c065b1e4
[   36.060003] df80: ef0ec000 c0d33610 4000406a 410fc075 00000000 ef0edfbc ef0edfc0 ef0edfb0
[   36.060003] dfa0: c000f72c c000f730 60010013 ffffffff
[   36.060003] [<c000dfc0>] (__irq_svc+0x40/0x70) from [<c000f730>] (default_idle+0x34/0x3c)
[   36.060003] [<c000f730>] (default_idle+0x34/0x3c) from [<c000fb5c>] (cpu_idle+0xa0/0xf4)
[   36.060003] [<c000fb5c>] (cpu_idle+0xa0/0xf4) from [<c064bfcc>] (secondary_start_kernel+0x108/0x12c)
[   36.060003] [<c064bfcc>] (secondary_start_kernel+0x108/0x12c) from [<4064b5b4>] (0x4064b5b4)
[   36.060003] ---[ end trace 9f075129d0750949 ]---

 

 

 

Now testing with 624 MHz again which also fails pretty early:

 

 

U-Boot SPL 2016.05-armbian (Jun 03 2016 - 16:34:18)
DRAM: 1024 MiB
Trying to boot from 


U-Boot 2016.05-armbian (Jun 03 2016 - 16:34:18 +0200) Allwinner Technology

CPU:   Allwinner H3 (SUN8I 1680)
Model: Xunlong Orange Pi PC
DRAM:  1 GiB
MMC:   ** First descriptor is NOT a primary desc on 0:1 **
SUNXI SD/MMC: 1 (SD), SUNXI SD/MMC: 0 (eMMC)
*** Warning - bad CRC, using default environment

In:    serial
Out:   serial
Err:   serial
Net:   No ethernet found.
starting USB...
USB0:   USB EHCI 1.00
USB1:   USB OHCI 1.0
USB2:   USB EHCI 1.00
USB3:   USB OHCI 1.0
USB4:   USB EHCI 1.00
USB5:   USB OHCI 1.0
scanning bus 0 for devices... 1 USB Device(s) found
scanning bus 2 for devices... 1 USB Device(s) found
scanning bus 4 for devices... 1 USB Device(s) found
Hit any key to stop autoboot:  0 
(FEL boot)
## Executing script at 43100000
## Booting kernel from Legacy Image at 42000000 ...
   Image Name:   Linux-3.4.39+
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    7045824 Bytes = 6.7 MiB
   Load Address: 40008000
   Entry Point:  40008000
   Verifying Checksum ... OK
   Loading Kernel Image ... OK
Using machid 0x1029 from environment

Starting kernel ...

[sun8i_fixup]: From boot, get meminfo:
        Start:  0x40000000
        Size:   1024MB
ion_carveout reserve: 160m@0 256m@0 130m@1 200m@1
ion_reserve_common: ion reserve: [0x70000000, 0x80000000]!
[    3.284966] failed to get normal led pin assign
[    3.284983] failed to get standby led pin assign
[    3.741252] sunxikbd_init failed. 
[    3.745035] ls_fetch_sysconfig_para: type err  device_used = 0. 
[    3.752897] tscdev_init: tsc driver is disabled
[    3.759817] [cpu_freq] ERR:get cpu extremity frequency from sysconfig failed, use max_freq
[    3.781927] no green_led, ignore it!
[    3.785890] no blue_led, ignore it!
[    3.792518] request gpio failed!
[    3.841021] ths_fetch_sysconfig_para: type err  device_used = 1. 
Starting logging: OK
Initializing random number generator... done.
Starting network...
This is a simple textured cube demo from the lima driver and
a memtester. Both combined in a single program. The mali400
hardware is only used to stress RAM in the background. But
this happens to significantly increase chances of exposing
memory stability related problems.

Kernel driver is version 14
Detected 1 Mali-400 GP Cores.
Detected 2 Mali-400 PP Cores.
FB: 1280x720@32bpp at 0x70200000 (0x00708000)
Using dual buffered direct rendering to FB.

memtester version 4.3.0 (32-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffff000
want 50MB (52428800 bytes)
got  50MB (52428800 bytes), trying mlock ...locked.
Loop 1/2:
  Stuck Address       : ok         
  Random Value        : ok
  Compare XOR         : ok
  Compare SUB         : ok
  Compare MUL         : ok
  Compare DIV         : ok
  Compare OR          : ok
  Compare AND         : ok
  Sequential Increment: ok
  Bit Flip            : testing 104READ FAILURE: 0xffffffff != 0xffffdfff at offset 0x0069f070 (bitflip).

Welcome!
lima-memtester login:  

 

 

 

So different DRAM reliability results aren't related to boot0 vs. u-boot and the latter's version doesn't have an effect at all. When using the Armbian image I created or SinoVoip's crappy Ubuntu Mate image (boot0) I'm able to succeed at 720MHz and fail at 744 MHz, using FEL boot exceeding 600 MHz fails.

 

So time to stop wasting time with this crappy board. As usual: Stay away from any device that can be powered through Micro USB since I would suspect the problem we're experiencing right now is that the board both get's power through the USB OTG port (where a Pine64 is connected to be the FEL host) and DC-IN and then $something happens that affects the stability of the board.

 

We use 624 MHz now as DRAM clockspeed in Armbian and already insanely low THS/cooler_table settings that seem to be necessary due to the board design. So it's not only the slowest H3 board ever due to throttling way earlier to insanely low clockspeeds (see Igor's result above: only 2 CPU cores running at 240MHz!) but guarantees also stability problems when powered through Micro USB (as usual).

Link to post
Share on other sites

BTW, not much related but anyway. I saw this MALI turbo speed patch failed out of our default branch ...

Does this make troubles?

diff --git a/drivers/gpu/mali/mali/platform/mali400-pmu/mali_platform.c b/drivers/gpu/mali/mali/platform/mali400-pmu/mali_platform.c
index 54e50d5..1dc4f79 100644
--- a/drivers/gpu/mali/mali/platform/mali400-pmu/mali_platform.c
+++ b/drivers/gpu/mali/mali/platform/mali400-pmu/mali_platform.c
@@ -37,7 +37,7 @@ static struct clk *gpu_pll  = NULL;
 
 _mali_osk_errcode_t mali_platform_init(void)
 {
-	int freq = 252; /* 252 MHz */
+	int freq = 600; /* 600 MHz */
 
 	gpu_pll = clk_get(NULL, PLL_GPU_CLK);
Link to post
Share on other sites

BTW, not much related but anyway. I saw this MALI turbo speed patch failed out of our default branch

 

This patch didn't apply at all after switching to the new BSP kernel from FriendlyARM a few weeks ago so I deleted it. Based on thermal readouts when running lima-memtester it's obvious that the new kernel clocks Mali higher. We should ask @Melanrz whether he can provide fps numbers for Quake (IIRC he reported 37 fps when we increased Mali clockspeed from ssvb's 252 MHz to 504 MHz before we activated this specific patch later finally increasing clockspeed to 600 MHz).

Link to post
Share on other sites

So I tried your image on my Nanopi (as asked on github), I've been optimistic and tried 696MHz First, I got a crash after 10 minutes or so. I tried 672, also crash. I tried 648, still crash within the 10 first minutes.

 

I made a sprunge : http://sprunge.us/hMHS

 

I still have strange ARISC errors so I don't think the DRAM is that bad, there could be a more general problem about Nanopi M1 or maybe the fact that it only have 512Mo of RAM .....

 

I haven't really checked the log, I'll do it tonight.

 

EDIT :

 

I've attached the rpi monitor graph. The real tests were made after 13:00

post-915-0-82208800-1465300816_thumb.png

Link to post
Share on other sites

I still have strange ARISC errors

 

These are there since you would've to adjust minimum cpufreq in /etc/default/cpufrequtils to 480 MHz (sorry, I forgot that to mention before). So after a reboot the errors should be gone. The graphs look good (and confirm voltage switching so the ARISC errors are related to trying to clock down to a frequency not  allowed in the dvfs settings).

 

Did you see the spinning cube at all when running lima-memtester? And it would be still interesting which type of DRAM is on the board (since in the meantime @Tido spotted that on BPi M2+ that shows horrible overheating problems not low power DDR3L as on all the Oranges is used but just normal DDR3)

Link to post
Share on other sites

About the cpufreq, I made the change 10 minutes ago, I should have thought of that before .... Sorry :(.

 

The 2 chips are samsung k4b2g1646q-bck0. If you need something else : a picture / to run a command, I'll do it.

 

Of course I forgot to state the obvious, the cube spin over a light gray background so I think it was good ... After 10 minutes, full screen went lightgray and keyboard / mouse weren't working anymore -> crash ! I may have moved the mouse during the test but I don't think it could be that.

Link to post
Share on other sites

Great so I guess buying two Nanopi M1 before any review was not a good idea ... Thanks for the information

 

Still the tests were made with an USB power supply, I'll remake the test with ATX power supply / GPIO pin to make sure it was not a power problem.

Link to post
Share on other sites

Still the tests were made with an USB power supply, I'll remake the test with ATX power supply / GPIO pin to make sure it was not a power problem.

 

That's a good idea especially if you have USB peripherals plugged in (I've only Apple keyboards and mice they're horribly power hungry -- with my special 'Micro USB crap' cable I saved to demonstrate how shitty powering through Micro USB is I'm able to power off every Banana Pi/Pro when I try to connect them since the voltage drops at that moment are too much for the boards :)

 

BTW: I did the testing all the time through a serial console or SSH. You can execute lima-memtester as root without any problems even if X11 is running. And I found it also somewhat convenient having potential error messages available even if the board crashed (your freezes sound more like a powering problem but it's good to confirm what's really going on later)

 

I'm also very curious about the thermal values you get :)

Link to post
Share on other sites

A good question. I have attached it to the ground from the Power supply.

I left mine on chassis of ATX power supply (that I'm using to power boards I'm testing) and measured more than 1.5V instead of 1.3 on OPi One.  :D

You can try connecting it to one of GPIO GND pins (i.e. pin 39). Also you can measure voltage on (between its leads) tantalum capacitor (big yellow thing in the middle of your photo).

 

Ideally you need to connect your positive probe to VDD_CPUFB signal, but I don't see any testpoint for it, and without resistor numbers on PCB it is almost impossible to find it.

Link to post
Share on other sites

Good morning fellows

 

red LED on

Power Supply 5,16 Volt (measured on the PCB, Pin39 to power-barrel)

GND attached to Pin 39

 

as reference

Pin 1 = 3,23 V

Pin 2 = 5,13 V

 

Capacitor yellow side 0,0 V

Capacitor orange side 1,30 V

 

I also measured again the points from the picture = same result as in the picture.

 

I left mine on chassis of ATX power supply

An ATX power supply delivers several volts, so I think it is very important where you connect ground.

Can you also make the reference check like I did above, what are your results?

Link to post
Share on other sites

Capacitor yellow side 0,0 V

Capacitor orange side 1,30 V

 

I also measured again the points from the picture = same result as in the picture.

So 1.3V it is

 

An ATX power supply delivers several volts, so I think it is very important where you connect ground.

Can you also make the reference check like I did above, what are your results?

It was an ATX power supply, now it has banana plug connectors to provide 5V and 12V for different purposes.

 

I tested my multimeter on REF01CPZ voltage reference (10V), it displayed 10.01V.

Link to post
Share on other sites

So 1.3V it is

 

Tried to reflect that in linux-sunxi wiki and linked to the thread where @sinovoip might (not) explain what happened: http://linux-sunxi.org/index.php?title=Sinovoip_Banana_Pi_M2%2B&curid=2677&diff=17540&oldid=17501

 

Anyway: Always being fed with 1.3V does only partially explain the horrible overheating experience we make with this board. Maybe it's really DRAM, let's see how NanoPi M1 that also uses DDR3 instead of DDR3L DRAM behaves.

Link to post
Share on other sites

So I retried today.

 

All tests were made with a modified ATX board so the 5V should be clean (I didn't have my multimeter at hand to be totally sure but it really should) without any usb devices . The ambient temperature is between 24 and 26°C (so quite hot)

 

I made two first tests with 648MHz and two tests with 624MHz Both tests failed within 15 minutes.

 

What's interesting is that the soc temp goes above 100°C (103 or 104°C). Does the kernel has a failsafe depending on temperature ? could it be the cause ?

 

lima_memtester is way more stressing that the cpuburn test I made following tkaiser's instructions some weeks ago.

 

Another interesting thing that bugs me is that there never is more that 1 core killed. I remember lurking on a discussion about that a few days ago, maybe we need to more agressive (at least on this board).

 

So far my plan is :

 * Try to tweak the cooling table to kill 3 cores when cooling state = 5 (other adjustments could be made after)

 * Add a fan

 

I'm really not a fan of my last proposal as that would mean putting both my nanopi in the trash :(.

 

 

post-915-0-90906200-1465387633_thumb.png

Link to post
Share on other sites

What's interesting is that the soc temp goes above 100°C (103 or 104°C). Does the kernel has a failsafe depending on temperature ?

 

Yes, currently that's configured to initiate an emergency shutdown at 105°C: https://github.com/igorpecovnik/lib/blob/master/config/fex/nanopim1.fex#L251-L288

 

But I totally agree: The settings I came up with are insufficient (Zador already pointed that out to me but we looked solely at Oranges and Bananas where this eventually worked and weren't aware that NanoPi M1 obviously is also hot stuff). As a first test it would be great if you could adopt the BPi M2+ settings (just replace  the highlighted lines with the stuff from BPi M2+ fex file and allow downclocking to 240 MHz again in /etc/default/cpufrequtils).

 

Please run the test again (starting with 624 MHz) and report back.

Link to post
Share on other sites
Guest
This topic is now closed to further replies.