tkaiser Posted June 2, 2016 Posted June 2, 2016 Update: BPi M2+ is done but further results would still be interesting. Now we need results for NanoPi M1. The most simple way is using the Armbian image as outlined in post #4 below. The only thing you would've to change is the following and can then run the lima-memtester binary as outlined below: ln -sf /boot/bin/nanopim1.bin /boot/script.bin echo nanopim1 > /etc/hostname reboot Dear BPi M2+ users. I just tested DRAM reliability with my BPi M2+ just to realize that this board doesn't run stable even with just 624 MHz clockspeed (currently testing 600 MHz for an additional hour or so). In case you have a BPi M2+ it would really help if you could do the same. Everything is outlined here: http://linux-sunxi.org/Xunlong_Orange_Pi_Plus_2E#DRAM_clock_speed_limit Just grab the referenced fel-boot-lima-memtester-on-orange-pi-h3-v3.tar.bz2 archive that now also contains stuff for BPi M2+ and then use the contained fel-boot-lima-memtester-on-banana-pi-m2plus script (I would also start with 624 and if that succeeds then check increasing DRAM clockspeed in 24 MHz steps). Please be aware that since SinoVoip saved a second led on BPi M2+ the red led will blink and you get no notification by a solid lighting 2nd led so you should let the test run at least for 1 hour. It's important to connect a HDMI display and ensure that a spinning cube can be seen with gray background (if the background is glowing red then something is wrong). Some more information can be found here: https://linux-sunxi.org/Hardware_Reliability_Tests#Reliability Please get back ASAP with results since Chen-Yu is currently preparing upstream u-boot support and DRAM timing is important! 1
tkaiser Posted June 3, 2016 Author Posted June 3, 2016 FYI: https://github.com/BPI-SINOVOIP/BPI-M2P-bsp/issues/3 And I tried to add what's necessary to successfully test on BPi M2+ through FEL boot in the wiki: http://linux-sunxi.org/Sinovoip_Banana_Pi_M2%2B#DRAM_clock_speed_limit
hojnikb Posted June 3, 2016 Posted June 3, 2016 It would be great, if someone would make a suite, that would test everything on these boards. Things like DRAM, CPU, GPU, SDIO... If it's easier to setup (or even built into the armbian) more users would do it.
tkaiser Posted June 3, 2016 Author Posted June 3, 2016 Here we go. You find a freshly built Armbian 5.14 Xenial (16.04 LTS) desktop image here: Armbian_5.14_Bananapim2plus_Ubuntu_xenial_3.4.112_desktop.7z (438M download size) This can be burned on any SD card larger than 2 GB and starts with a DRAM clockspeed of 648 MHz (and we do not allow switching between different DRAM clockspeeds: "# CONFIG_DEVFREQ_DRAM_FREQ is not set"). Also a statically linked lima-memtester binary is included. To start with this please let RPi-Monitor install and then start the test in the following way (as root -- do a 'sudo su -' before if you're not already super user): armbianmonitor -r /usr/local/bin/lima-memtester 100M >/dev/null 2>&1 Since we disabled CONFIG_DEVFREQ_DRAM_FREQ RPi-Monitor won't be able to show actual DRAM frequency any more so we have to trust in settings. IMPORTANT: The test is only useful when a connected HDMI display is on and shows a spinning cube on a gray background and this runs at least 1 hour. In case you see a glowing red background then something's already wrong and you have to switch DRAM frequency. So if it looks like this then the test FAILED: To change DRAM clockspeed you need this archive here: u-boot-bananapim2plus_5.14_memtester.tar.bz2 The contents are as follows: linux-u-boot-bananapim2plus_5.14_armhf_600MHz.deb linux-u-boot-bananapim2plus_5.14_armhf_624MHz.deb linux-u-boot-bananapim2plus_5.14_armhf_648MHz.deb linux-u-boot-bananapim2plus_5.14_armhf_672MHz.deb linux-u-boot-bananapim2plus_5.14_armhf_696MHz.deb linux-u-boot-bananapim2plus_5.14_armhf_720MHz.deb linux-u-boot-bananapim2plus_5.14_armhf_744MHz.deb So to switch to eg. 624 MHz you would grab the archive, untar it using 'tar xf /path/to/u-boot-bananapim2plus_5.14_memtester.tar.bz2' and then do a 'dpkg -i linux-u-boot-bananapim2plus_5.14_armhf_624MHz.deb && sync && reboot'. And then start again using /usr/local/bin/lima-memtester 100M >/dev/null 2>&1
tkaiser Posted June 4, 2016 Author Posted June 4, 2016 Any volunteers? We're in an urgent need of further testers. The procedure outlined above should be simple enough, isn't it? Grab a 4 GB card, burn the image, start it, create the usual normal user, install RPi-Monitor (please see below) and then let the test run and get back to here with feedback. BTW: Installation of RPi-Monitor would really help getting an idea whether H3 on my BPi M2+ is broken or whether heat dissipation of this board is broken in general. When I run this image with just a heatsink on H3 and without a fan then H3 will get clocked down to 312 MHz and also one CPU core will be killed. The same image running on an OPi PC Plus (after relinking script.bin) with the same heatsink in the same location only clocks down to 1008/1200 MHz. So it would really help if others can show their thermal measurements while executing the test as outlined above.
Igor Posted June 4, 2016 Posted June 4, 2016 My cube started to spin with 648, screen @720p output is normal, using small heat sink. Is this normal?
tkaiser Posted June 5, 2016 Author Posted June 5, 2016 Is this normal? Huh, it really seems BPi M2+ has a horrible 'thermal design', you experienced already 2 CPU cores being killed. When I adjusted the cooler_table entries after first real tests with BPi M2+ to such low values I would've never thought anyone will be able to reach this unless he uses really an 'enclosure from hell' without any airflow. But it seems we both manage to get CPU cores being killed at 240 MHz when running outside an enclosure and with heatsink applied (while H3 Oranges happily run with the same workload at +1000 MHz with 4 cores) Anyway: I tested on the basis of boot0 using a SinoVoip OS image and were able to check DRAM with 720MHz clockspeed successfully. Then I did the same with our Armbian test image (using u-boot 2016.05) and could confirm: 720MHz work at least for an hour while 744MHz already gave a glowing red background. Now I replaced u-boot+spl on the Armbian image with the stuff from ssvb when he created his FEL boot based lima-memtester archive (full bootlog) and am currently testing 720MHz. Spinning cube after 15 minutes -- will let this run for an hour and start then FEL boot test (using 'our' u-boot 2016.05 then). @Igor: Did you try out higher DRAM clockspeeds already or just the default 648 MHz I used when creating the image? Maybe the different power scheme on the BPi M2+ is responsible for the worse results I got. BPi M2+ powers up when an USB cable is connected to the Micro USB port. Will have a look later when testing FEL mode again. Maybe it's just instable DC-IN when both a PSU and another host on the OTG port 'provide' power?
Igor Posted June 5, 2016 Posted June 5, 2016 Did you try out higher DRAM clockspeeds already or just the default 648 MHz I used when creating the image? No. It seems pointless. 1
tkaiser Posted June 5, 2016 Author Posted June 5, 2016 Now testing again FEL boot (the 'usual' lima-memtester approach) but using the most recent u-boot version the Armbian test image also uses. Since it failed the last time already at 624 MHz I tried it now with 672 MHz: U-Boot SPL 2016.05-armbian (Jun 03 2016 - 16:46:24) DRAM: 1024 MiB Trying to boot from U-Boot 2016.05-armbian (Jun 03 2016 - 16:46:24 +0200) Allwinner Technology CPU: Allwinner H3 (SUN8I 1680) Model: Xunlong Orange Pi PC DRAM: 1 GiB MMC: ** First descriptor is NOT a primary desc on 0:1 ** SUNXI SD/MMC: 1 (SD), SUNXI SD/MMC: 0 (eMMC) *** Warning - bad CRC, using default environment In: serial Out: serial Err: serial Net: No ethernet found. starting USB... USB0: USB EHCI 1.00 USB1: USB OHCI 1.0 USB2: USB EHCI 1.00 USB3: USB OHCI 1.0 USB4: USB EHCI 1.00 USB5: USB OHCI 1.0 scanning bus 0 for devices... 1 USB Device(s) found scanning bus 2 for devices... 1 USB Device(s) found scanning bus 4 for devices... 1 USB Device(s) found Hit any key to stop autoboot: 0 (FEL boot) ## Executing script at 43100000 ## Booting kernel from Legacy Image at 42000000 ... Image Name: Linux-3.4.39+ Image Type: ARM Linux Kernel Image (uncompressed) Data Size: 7045824 Bytes = 6.7 MiB Load Address: 40008000 Entry Point: 40008000 Verifying Checksum ... OK Loading Kernel Image ... OK Using machid 0x1029 from environment Starting kernel ... [sun8i_fixup]: From boot, get meminfo: Start: 0x40000000 Size: 1024MB ion_carveout reserve: 160m@0 256m@0 130m@1 200m@1 ion_reserve_common: ion reserve: [0x70000000, 0x80000000]! [ 3.283597] failed to get normal led pin assign [ 3.283612] failed to get standby led pin assign [ 3.741186] sunxikbd_init failed. [ 3.744969] ls_fetch_sysconfig_para: type err device_used = 0. [ 3.752828] tscdev_init: tsc driver is disabled [ 3.759757] [cpu_freq] ERR:get cpu extremity frequency from sysconfig failed, use max_freq [ 3.781832] no green_led, ignore it! [ 3.785795] no blue_led, ignore it! [ 3.792458] request gpio failed! [ 3.840830] ths_fetch_sysconfig_para: type err device_used = 1. Starting logging: OK Initializing random number generator... done. Starting network... This is a simple textured cube demo from the lima driver and a memtester. Both combined in a single program. The mali400 hardware is only used to stress RAM in the background. But this happens to significantly increase chances of exposing memory stability related problems. Kernel driver is version 14 Detected 1 Mali-400 GP Cores. Detected 2 Mali-400 PP Cores. FB: 1280x720@32bpp at 0x70200000 (0x00708000) Using dual buffered direct rendering to FB. memtester version 4.3.0 (32-bit) Copyright (C) 2001-2012 Charles Cazabon. Licensed under the GNU General Public License version 2 (only). pagesize is 4096 pagesizemask is 0xfffff000 want 50MB (52428800 bytes) got 50MB (52428800 bytes), trying mlock ...locked. Loop 1/2: Stuck Address : testing 0FAILURE: possible bad address line at offset 0x0000fbf0. Skipping to next test... Random Value : [ 6.190727] Unable to handle kernel paging request at virtual address ae164bc8 [ 6.198758] pgd = c0004000 [ 6.200600] [ae164bc8] *pgd=00000000 [ 6.200600] sunxi oops: enable sdcard JTAG interface [ 6.200600] sunxi oops: cpu frequency: 1200 MHz [ 6.200600] sunxi oops: ddr frequency: 672 MHz [ 6.200600] sunxi oops: gpu frequency: 252 MHz [ 6.200600] sunxi oops: cpu temperature: 53 [ 6.200600] Internal error: Oops: 5 [#1] PREEMPT SMP ARM [ 6.200600] Modules linked in: [ 6.200600] CPU: 0 Not tainted (3.4.39+ #1) [ 6.200600] PC is at cpuacct_charge+0x54/0xc8 [ 6.200600] LR is at 0xa6c000 [ 6.200600] pc : [<c005caf8>] lr : [<00a6c000>] psr: a00b0093 [ 6.200600] sp : ed4abd88 ip : ed4abd88 fp : ed4abda4 [ 6.200600] r10: ed48aac0 r9 : 00000001 r8 : ed48aaf8 [ 6.200600] r7 : 00000000 r6 : ed48aac0 r5 : 00000000 r4 : 00010ff9 [ 6.200600] r3 : c0d45d08 r2 : c0cb8e40 r1 : 00000000 r0 : ed48aac0 [ 6.200600] Flags: NzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel [ 6.200600] Control: 10c5387d Table: 6e9a406a DAC: 00000015 [ 6.200600] [ 6.200600] PC: 0xc005ca78: [ 6.200600] ca78 eb17e727 e1a00007 e24bd028 e89daff0 c0ce71b0 c0ce6d44 c0cba040 c0056e9c [ 6.200600] ca98 c0050ce4 c0d45d28 c0d46208 e1a0c00d e92dd8f0 e24cb004 e52de004 e8bd4000 [ 6.200600] cab8 e1a05003 e59f3068 e1a06000 e1a04002 e5933370 e3530000 089da8f0 e5903004 [ 6.200600] cad8 e5937014 eb011d02 e5963480 e59fe044 e5933020 ea00000a e79ec107 e5932010 [ 6.200600] caf8 e18200dc e0900004 e0a11005 e18200fc e5933000 e5933018 e3530000 0a000002 [ 6.200600] cb18 e5933024 e3530000 1afffff2 eb0121e6 e89da8f0 c0ce6d44 c0cdb4f8 e1a0c00d [ 6.200600] cb38 e92dd800 e24cb004 e59f301c e5932000 e59f3018 e2822c75 e2822030 e0832392 [ 6.200600] cb58 e1a00002 e1a01003 e89da800 c0cba0c0 00989680 e1a0c00d e92ddbf0 e24cb004 [ 6.200600] [ 6.200600] SP: 0xed4abd08: [ 6.200600] bd08 ee969580 c0658ef4 ef0c8ac0 ef0c8ac0 ed4abd34 ed4abd28 c005caf8 a00b0093 [ 6.200600] bd28 ffffffff ed4abd74 ed4abda4 ed4abd40 c000df58 c000836c ed48aac0 00000000 [ 6.200600] bd48 c0cb8e40 c0d45d08 00010ff9 00000000 ed48aac0 00000000 ed48aaf8 00000001 [ 6.200600] bd68 ed48aac0 ed4abda4 ed4abd88 ed4abd88 00a6c000 c005caf8 a00b0093 ffffffff [ 6.200600] bd88 00010ff9 00000000 0dbb2a93 00000000 ed4abde4 ed4abda8 c005e0d8 c005cab0 [ 6.200600] bda8 ef0ce780 00000004 ed4abdec ed4abdc0 c005515c 0078e8a3 00000000 ed48aaf8 [ 6.200600] bdc8 c1722750 70fe6fd0 00000001 c1722700 ed4abe8c ed4abde8 c0060058 c005df9c [ 6.200600] bde8 c005cb84 c00113c0 70fe4049 00000001 ef0ce600 ee9a3e1c 70fe4049 00000001 [ 6.200600] [ 6.200600] IP: 0xed4abd08: [ 6.200600] bd08 ee969580 c0658ef4 ef0c8ac0 ef0c8ac0 ed4abd34 ed4abd28 c005caf8 a00b0093 [ 6.200600] bd28 ffffffff ed4abd74 ed4abda4 ed4abd40 c000df58 c000836c ed48aac0 00000000 [ 6.200600] bd48 c0cb8e40 c0d45d08 00010ff9 00000000 ed48aac0 00000000 ed48aaf8 00000001 [ 6.200600] bd68 ed48aac0 ed4abda4 ed4abd88 ed4abd88 00a6c000 c005caf8 a00b0093 ffffffff [ 6.200600] bd88 00010ff9 00000000 0dbb2a93 00000000 ed4abde4 ed4abda8 c005e0d8 c005cab0 [ 6.200600] bda8 ef0ce780 00000004 ed4abdec ed4abdc0 c005515c 0078e8a3 00000000 ed48aaf8 [ 6.200600] bdc8 c1722750 70fe6fd0 00000001 c1722700 ed4abe8c ed4abde8 c0060058 c005df9c [ 6.200600] bde8 c005cb84 c00113c0 70fe4049 00000001 ef0ce600 ee9a3e1c 70fe4049 00000001 [ 6.200600] [ 6.200600] FP: 0xed4abd24: [ 6.200600] bd24 a00b0093 ffffffff ed4abd74 ed4abda4 ed4abd40 c000df58 c000836c ed48aac0 [ 6.200600] bd44 00000000 c0cb8e40 c0d45d08 00010ff9 00000000 ed48aac0 00000000 ed48aaf8 [ 6.200600] bd64 00000001 ed48aac0 ed4abda4 ed4abd88 ed4abd88 00a6c000 c005caf8 a00b0093 [ 6.200600] bd84 ffffffff 00010ff9 00000000 0dbb2a93 00000000 ed4abde4 ed4abda8 c005e0d8 [ 6.200600] bda4 c005cab0 ef0ce780 00000004 ed4abdec ed4abdc0 c005515c 0078e8a3 00000000 [ 6.200600] bdc4 ed48aaf8 c1722750 70fe6fd0 00000001 c1722700 ed4abe8c ed4abde8 c0060058 [ 6.200600] bde4 c005df9c c005cb84 c00113c0 70fe4049 00000001 ef0ce600 ee9a3e1c 70fe4049 [ 6.200600] be04 00000001 00000000 c1722700 c1722700 c0cdb4f8 c1722c80 00000089 ed4abe6c [ 6.200600] [ 6.200600] R0: 0xed48aa40: [ 6.200600] aa40 ed48aa3c 00000000 00000000 00000000 00000000 00000020 00000000 0000c350 [ 6.200600] aa60 0000c350 00000000 ffffffff 00000000 00000000 00000000 00000000 00000000 [ 6.200600] aa80 00000000 00000000 00000001 00000000 ffffffff ffffffff ffffffff ffffffff [ 6.200600] aaa0 ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffff7f ffffffff [ 6.200600] aac0 00000001 ed4aa000 00000002 04208060 00000000 00000000 00000001 00000001 [ 6.200600] aae0 00000078 00000078 00000078 00000000 c065e978 00000000 00000400 00400000 [ 6.200600] ab00 00000001 00000000 00000000 c1722bdc c1722bdc 00000001 70fe6fd0 00000001 [ 6.200600] ab20 0454b4c1 00000000 0dbb2a93 00000000 0453a4c8 00000000 00000267 00000000 [ 6.200600] [ 6.200600] R2: 0xc0cb8dc0: [ 6.200600] 8dc0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 6.200600] 8de0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 6.200600] 8e00 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 6.200600] 8e20 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 6.200600] 8e40 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 6.200600] 8e60 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 6.200600] 8e80 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 6.200600] 8ea0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 6.200600] [ 6.200600] R3: 0xc0d45c88: [ 6.200600] 5c88 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 6.200600] 5ca8 00000000 00000000 00000000 00000000 ef02d580 00989680 00000000 00000000 [ 6.200600] 5cc8 00000000 00000000 00000000 ef019e40 00000000 00000000 c0cec2b4 ef0cf940 [ 6.200600] 5ce8 ef000dc0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 6.200600] 5d08 c0d57a80 00000001 00000001 00000000 c0cb8e40 c0cb48a8 00000000 00000000 [ 6.200600] 5d28 c0d57a80 00000001 00000001 00000000 ef002740 ef002750 00000400 00000000 [ 6.200600] 5d48 0000006f 00000000 00000077 00000077 ef002760 ef002770 00000000 00000000 [ 6.200600] 5d68 3b9aca00 00000000 389fd980 00000000 00000001 c1728a40 c1728948 00000000 [ 6.200600] [ 6.200600] R6: 0xed48aa40: [ 6.200600] aa40 ed48aa3c 00000000 00000000 00000000 00000000 00000020 00000000 0000c350 [ 6.200600] aa60 0000c350 00000000 ffffffff 00000000 00000000 00000000 00000000 00000000 [ 6.200600] aa80 00000000 00000000 00000001 00000000 ffffffff ffffffff ffffffff ffffffff [ 6.200600] aaa0 ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffff7f ffffffff [ 6.200600] aac0 00000001 ed4aa000 00000002 04208060 00000000 00000000 00000001 00000001 [ 6.200600] aae0 00000078 00000078 00000078 00000000 c065e978 00000000 00000400 00400000 [ 6.200600] ab00 00000001 00000000 00000000 c1722bdc c1722bdc 00000001 70fe6fd0 00000001 [ 6.200600] ab20 0454b4c1 00000000 0dbb2a93 00000000 0453a4c8 00000000 00000267 00000000 [ 6.200600] [ 6.200600] R8: 0xed48aa78: [ 6.200600] aa78 00000000 00000000 00000000 00000000 00000001 00000000 ffffffff ffffffff [ 6.200600] aa98 ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff [ 6.200600] aab8 ffffff7f ffffffff 00000001 ed4aa000 00000002 04208060 00000000 00000000 [ 6.200600] aad8 00000001 00000001 00000078 00000078 00000078 00000000 c065e978 00000000 [ 6.200600] aaf8 00000400 00400000 00000001 00000000 00000000 c1722bdc c1722bdc 00000001 [ 6.200600] ab18 70fe6fd0 00000001 0454b4c1 00000000 0dbb2a93 00000000 0453a4c8 00000000 [ 6.200600] ab38 00000267 00000000 00000000 00000000 003a4fe7 00000000 000006a2 00000000 [ 6.200600] ab58 0079c775 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 6.200600] [ 6.200600] R10: 0xed48aa40: [ 6.200600] aa40 ed48aa3c 00000000 00000000 00000000 00000000 00000020 00000000 0000c350 [ 6.200600] aa60 0000c350 00000000 ffffffff 00000000 00000000 00000000 00000000 00000000 [ 6.200600] aa80 00000000 00000000 00000001 00000000 ffffffff ffffffff ffffffff ffffffff [ 6.200600] aaa0 ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffff7f ffffffff [ 6.200600] aac0 00000001 ed4aa000 00000002 04208060 00000000 00000000 00000001 00000001 [ 6.200600] aae0 00000078 00000078 00000078 00000000 c065e978 00000000 00000400 00400000 [ 6.200600] ab00 00000001 00000000 00000000 c1722bdc c1722bdc 00000001 70fe6fd0 00000001 [ 6.200600] ab20 0454b4c1 00000000 0dbb2a93 00000000 0453a4c8 00000000 00000267 00000000 [ 6.200600] Process kworker/u:2 (pid: 69, stack limit = 0xed4aa2f8) [ 6.200600] Stack: (0xed4abd88 to 0xed4ac000) [ 6.200600] bd80: 00010ff9 00000000 0dbb2a93 00000000 ed4abde4 ed4abda8 [ 6.200600] bda0: c005e0d8 c005cab0 ef0ce780 00000004 ed4abdec ed4abdc0 c005515c 0078e8a3 [ 6.200600] bdc0: 00000000 ed48aaf8 c1722750 70fe6fd0 00000001 c1722700 ed4abe8c ed4abde8 [ 6.200600] bde0: c0060058 c005df9c c005cb84 c00113c0 70fe4049 00000001 ef0ce600 ee9a3e1c [ 6.200600] be00: 70fe4049 00000001 00000000 c1722700 c1722700 c0cdb4f8 c1722c80 00000089 [ 6.200600] be20: ed4abe6c ed4abe30 00000001 c005cb78 c004947c c005a9c4 00000000 c1722c80 [ 6.200600] be40: ed4abe7c 00000000 c1722700 c1722700 c0cdb4f8 ed48ada0 00000089 ed48aac0 [ 6.200600] be60: ed4abe8c 0078e8a3 00000000 00000000 00000000 70fe6fd0 00000001 ed48aac0 [ 6.200600] be80: ed4abec4 ed4abe90 c0057298 c006002c 00000001 c1722700 ed4abed4 ed48aac0 [ 6.200600] bea0: c1722700 ed4aa000 c0cdb4f8 ed48ada0 00000089 c0d457c0 ed4abed4 ed4abec8 [ 6.200600] bec0: c00578b4 c00571d0 ed4abf74 ed4abed8 c0657ae4 c0057888 ed4abf0c ed4abee8 [ 6.200600] bee0: c0372564 c0373a58 ef285500 00000003 c0cb6700 c0657d30 c0cb6700 00000000 [ 6.200600] bf00: ed4abf24 ed4abf10 c0372b78 c037235c 00000000 c0658ef4 ef0cf4c0 ee8e3cd0 [ 6.200600] bf20: ed4abf3c ed4abf30 c0658ef4 c0658e3c ed4abf4c c00442f0 ef0cf4c0 ee8e3cd0 [ 6.200600] bf40: ed4abf84 ed4abf50 c00442f0 ef0cf4c0 c0d457c0 ed4aa000 ef0cf4d0 c0d457c0 [ 6.200600] bf60: 00000089 c0d457c0 ed4abf84 ed4abf78 c0657d30 c0657450 ed4abfb4 ed4abf88 [ 6.200600] bf80: c0044808 c0657cac 00000000 ee439edc ef0cf4c0 c0044578 00000013 00000000 [ 6.200600] bfa0: 00000000 00000000 ed4abff4 ed4abfb8 c0048a88 c0044584 00000000 00000000 [ 6.200600] bfc0: ef0cf4c0 00000000 00000000 00000000 ed4abfd0 ed4abfd0 00000000 ee439edc [ 6.200600] bfe0: c00489ec c000f66c 00000000 ed4abff8 c000f66c c00489f8 ffffffff ffffffff [ 6.200600] [<c005caf8>] (cpuacct_charge+0x54/0xc8) from [<c005e0d8>] (update_curr+0x148/0x1bc) [ 6.200600] [<c005e0d8>] (update_curr+0x148/0x1bc) from [<c0060058>] (dequeue_task_fair+0x38/0xd14) [ 6.200600] [<c0060058>] (dequeue_task_fair+0x38/0xd14) from [<c0057298>] (dequeue_task+0xd4/0xe4) [ 6.200600] [<c0057298>] (dequeue_task+0xd4/0xe4) from [<c00578b4>] (deactivate_task+0x38/0x3c) [ 6.200600] [<c00578b4>] (deactivate_task+0x38/0x3c) from [<c0657ae4>] (__schedule+0x6a0/0x74c) [ 6.200600] [<c0657ae4>] (__schedule+0x6a0/0x74c) from [<c0657d30>] (schedule+0x90/0x94) [ 6.200600] [<c0657d30>] (schedule+0x90/0x94) from [<c0044808>] (worker_thread+0x290/0x2d0) [ 6.200600] [<c0044808>] (worker_thread+0x290/0x2d0) from [<c0048a88>] (kthread+0x9c/0xac) [ 6.200600] [<c0048a88>] (kthread+0x9c/0xac) from [<c000f66c>] (kernel_thread_exit+0x0/0x8) [ 6.200600] Code: e5933020 ea00000a e79ec107 e5932010 (e18200dc) [ 36.050009] ------------[ cut here ]------------ [ 36.055137] WARNING: at kernel/watchdog.c:255 watchdog_timer_fn+0x10c/0x2e4() [ 36.060003] Watchdog detected hard LOCKUP on cpu 0 [ 36.060003] Modules linked in: [ 36.060003] [<c0016de8>] (unwind_backtrace+0x0/0xec) from [<c064f090>] (dump_stack+0x20/0x24) [ 36.060003] [<c064f090>] (dump_stack+0x20/0x24) from [<c0027eb8>] (warn_slowpath_common+0x5c/0x74) [ 36.060003] [<c0027eb8>] (warn_slowpath_common+0x5c/0x74) from [<c0027f8c>] (warn_slowpath_fmt+0x40/0x48) [ 36.060003] [<c0027f8c>] (warn_slowpath_fmt+0x40/0x48) from [<c009d8b4>] (watchdog_timer_fn+0x10c/0x2e4) [ 36.060003] [<c009d8b4>] (watchdog_timer_fn+0x10c/0x2e4) from [<c004cd8c>] (__run_hrtimer+0x138/0x2a4) [ 36.060003] [<c004cd8c>] (__run_hrtimer+0x138/0x2a4) from [<c004da64>] (hrtimer_interrupt+0x130/0x298) [ 36.060003] [<c004da64>] (hrtimer_interrupt+0x130/0x298) from [<c0015330>] (arch_timer_handler+0x38/0x40) [ 36.060003] [<c0015330>] (arch_timer_handler+0x38/0x40) from [<c00a1868>] (handle_percpu_devid_irq+0xe0/0x1b4) [ 36.060003] [<c00a1868>] (handle_percpu_devid_irq+0xe0/0x1b4) from [<c009dec0>] (generic_handle_irq+0x30/0x40) [ 36.060003] [<c009dec0>] (generic_handle_irq+0x30/0x40) from [<c000f404>] (handle_IRQ+0x88/0xc8) [ 36.060003] [<c000f404>] (handle_IRQ+0x88/0xc8) from [<c0008540>] (gic_handle_irq+0x58/0x88) [ 36.060003] [<c0008540>] (gic_handle_irq+0x58/0x88) from [<c000dfc0>] (__irq_svc+0x40/0x70) [ 36.060003] Exception stack(0xef0edf68 to 0xef0edfb0) [ 36.060003] df60: c1738b38 00000000 0000000f 00000000 ef0ec000 c065b1e4 [ 36.060003] df80: ef0ec000 c0d33610 4000406a 410fc075 00000000 ef0edfbc ef0edfc0 ef0edfb0 [ 36.060003] dfa0: c000f72c c000f730 60010013 ffffffff [ 36.060003] [<c000dfc0>] (__irq_svc+0x40/0x70) from [<c000f730>] (default_idle+0x34/0x3c) [ 36.060003] [<c000f730>] (default_idle+0x34/0x3c) from [<c000fb5c>] (cpu_idle+0xa0/0xf4) [ 36.060003] [<c000fb5c>] (cpu_idle+0xa0/0xf4) from [<c064bfcc>] (secondary_start_kernel+0x108/0x12c) [ 36.060003] [<c064bfcc>] (secondary_start_kernel+0x108/0x12c) from [<4064b5b4>] (0x4064b5b4) [ 36.060003] ---[ end trace 9f075129d0750949 ]--- Now testing with 624 MHz again which also fails pretty early: U-Boot SPL 2016.05-armbian (Jun 03 2016 - 16:34:18) DRAM: 1024 MiB Trying to boot from U-Boot 2016.05-armbian (Jun 03 2016 - 16:34:18 +0200) Allwinner Technology CPU: Allwinner H3 (SUN8I 1680) Model: Xunlong Orange Pi PC DRAM: 1 GiB MMC: ** First descriptor is NOT a primary desc on 0:1 ** SUNXI SD/MMC: 1 (SD), SUNXI SD/MMC: 0 (eMMC) *** Warning - bad CRC, using default environment In: serial Out: serial Err: serial Net: No ethernet found. starting USB... USB0: USB EHCI 1.00 USB1: USB OHCI 1.0 USB2: USB EHCI 1.00 USB3: USB OHCI 1.0 USB4: USB EHCI 1.00 USB5: USB OHCI 1.0 scanning bus 0 for devices... 1 USB Device(s) found scanning bus 2 for devices... 1 USB Device(s) found scanning bus 4 for devices... 1 USB Device(s) found Hit any key to stop autoboot: 0 (FEL boot) ## Executing script at 43100000 ## Booting kernel from Legacy Image at 42000000 ... Image Name: Linux-3.4.39+ Image Type: ARM Linux Kernel Image (uncompressed) Data Size: 7045824 Bytes = 6.7 MiB Load Address: 40008000 Entry Point: 40008000 Verifying Checksum ... OK Loading Kernel Image ... OK Using machid 0x1029 from environment Starting kernel ... [sun8i_fixup]: From boot, get meminfo: Start: 0x40000000 Size: 1024MB ion_carveout reserve: 160m@0 256m@0 130m@1 200m@1 ion_reserve_common: ion reserve: [0x70000000, 0x80000000]! [ 3.284966] failed to get normal led pin assign [ 3.284983] failed to get standby led pin assign [ 3.741252] sunxikbd_init failed. [ 3.745035] ls_fetch_sysconfig_para: type err device_used = 0. [ 3.752897] tscdev_init: tsc driver is disabled [ 3.759817] [cpu_freq] ERR:get cpu extremity frequency from sysconfig failed, use max_freq [ 3.781927] no green_led, ignore it! [ 3.785890] no blue_led, ignore it! [ 3.792518] request gpio failed! [ 3.841021] ths_fetch_sysconfig_para: type err device_used = 1. Starting logging: OK Initializing random number generator... done. Starting network... This is a simple textured cube demo from the lima driver and a memtester. Both combined in a single program. The mali400 hardware is only used to stress RAM in the background. But this happens to significantly increase chances of exposing memory stability related problems. Kernel driver is version 14 Detected 1 Mali-400 GP Cores. Detected 2 Mali-400 PP Cores. FB: 1280x720@32bpp at 0x70200000 (0x00708000) Using dual buffered direct rendering to FB. memtester version 4.3.0 (32-bit) Copyright (C) 2001-2012 Charles Cazabon. Licensed under the GNU General Public License version 2 (only). pagesize is 4096 pagesizemask is 0xfffff000 want 50MB (52428800 bytes) got 50MB (52428800 bytes), trying mlock ...locked. Loop 1/2: Stuck Address : ok Random Value : ok Compare XOR : ok Compare SUB : ok Compare MUL : ok Compare DIV : ok Compare OR : ok Compare AND : ok Sequential Increment: ok Bit Flip : testing 104READ FAILURE: 0xffffffff != 0xffffdfff at offset 0x0069f070 (bitflip). Welcome! lima-memtester login: So different DRAM reliability results aren't related to boot0 vs. u-boot and the latter's version doesn't have an effect at all. When using the Armbian image I created or SinoVoip's crappy Ubuntu Mate image (boot0) I'm able to succeed at 720MHz and fail at 744 MHz, using FEL boot exceeding 600 MHz fails. So time to stop wasting time with this crappy board. As usual: Stay away from any device that can be powered through Micro USB since I would suspect the problem we're experiencing right now is that the board both get's power through the USB OTG port (where a Pine64 is connected to be the FEL host) and DC-IN and then $something happens that affects the stability of the board. We use 624 MHz now as DRAM clockspeed in Armbian and already insanely low THS/cooler_table settings that seem to be necessary due to the board design. So it's not only the slowest H3 board ever due to throttling way earlier to insanely low clockspeeds (see Igor's result above: only 2 CPU cores running at 240MHz!) but guarantees also stability problems when powered through Micro USB (as usual).
Igor Posted June 5, 2016 Posted June 5, 2016 BTW, not much related but anyway. I saw this MALI turbo speed patch failed out of our default branch ... Does this make troubles? diff --git a/drivers/gpu/mali/mali/platform/mali400-pmu/mali_platform.c b/drivers/gpu/mali/mali/platform/mali400-pmu/mali_platform.c index 54e50d5..1dc4f79 100644 --- a/drivers/gpu/mali/mali/platform/mali400-pmu/mali_platform.c +++ b/drivers/gpu/mali/mali/platform/mali400-pmu/mali_platform.c @@ -37,7 +37,7 @@ static struct clk *gpu_pll = NULL; _mali_osk_errcode_t mali_platform_init(void) { - int freq = 252; /* 252 MHz */ + int freq = 600; /* 600 MHz */ gpu_pll = clk_get(NULL, PLL_GPU_CLK);
tkaiser Posted June 5, 2016 Author Posted June 5, 2016 BTW, not much related but anyway. I saw this MALI turbo speed patch failed out of our default branch This patch didn't apply at all after switching to the new BSP kernel from FriendlyARM a few weeks ago so I deleted it. Based on thermal readouts when running lima-memtester it's obvious that the new kernel clocks Mali higher. We should ask @Melanrz whether he can provide fps numbers for Quake (IIRC he reported 37 fps when we increased Mali clockspeed from ssvb's 252 MHz to 504 MHz before we activated this specific patch later finally increasing clockspeed to 600 MHz).
vlad59 Posted June 7, 2016 Posted June 7, 2016 So I tried your image on my Nanopi (as asked on github), I've been optimistic and tried 696MHz First, I got a crash after 10 minutes or so. I tried 672, also crash. I tried 648, still crash within the 10 first minutes. I made a sprunge : http://sprunge.us/hMHS I still have strange ARISC errors so I don't think the DRAM is that bad, there could be a more general problem about Nanopi M1 or maybe the fact that it only have 512Mo of RAM ..... I haven't really checked the log, I'll do it tonight. EDIT : I've attached the rpi monitor graph. The real tests were made after 13:00
tkaiser Posted June 7, 2016 Author Posted June 7, 2016 I still have strange ARISC errors These are there since you would've to adjust minimum cpufreq in /etc/default/cpufrequtils to 480 MHz (sorry, I forgot that to mention before). So after a reboot the errors should be gone. The graphs look good (and confirm voltage switching so the ARISC errors are related to trying to clock down to a frequency not allowed in the dvfs settings). Did you see the spinning cube at all when running lima-memtester? And it would be still interesting which type of DRAM is on the board (since in the meantime @Tido spotted that on BPi M2+ that shows horrible overheating problems not low power DDR3L as on all the Oranges is used but just normal DDR3)
vlad59 Posted June 7, 2016 Posted June 7, 2016 About the cpufreq, I made the change 10 minutes ago, I should have thought of that before .... Sorry . The 2 chips are samsung k4b2g1646q-bck0. If you need something else : a picture / to run a command, I'll do it. Of course I forgot to state the obvious, the cube spin over a light gray background so I think it was good ... After 10 minutes, full screen went lightgray and keyboard / mouse weren't working anymore -> crash ! I may have moved the mouse during the test but I don't think it could be that.
zador.blood.stained Posted June 7, 2016 Posted June 7, 2016 The 2 chips are samsung k4b2g1646q-bck0. If you need something else : a picture / to run a command, I'll do it. http://www.samsung.com/semiconductor/global/file/product/DS_K4B2G1646Q-BC_Rev103.pdf VDDQ = 1.5V ± 0.075V So it's not a DDR3L
vlad59 Posted June 7, 2016 Posted June 7, 2016 Great so I guess buying two Nanopi M1 before any review was not a good idea ... Thanks for the information Still the tests were made with an USB power supply, I'll remake the test with ATX power supply / GPIO pin to make sure it was not a power problem.
tkaiser Posted June 7, 2016 Author Posted June 7, 2016 Still the tests were made with an USB power supply, I'll remake the test with ATX power supply / GPIO pin to make sure it was not a power problem. That's a good idea especially if you have USB peripherals plugged in (I've only Apple keyboards and mice they're horribly power hungry -- with my special 'Micro USB crap' cable I saved to demonstrate how shitty powering through Micro USB is I'm able to power off every Banana Pi/Pro when I try to connect them since the voltage drops at that moment are too much for the boards BTW: I did the testing all the time through a serial console or SSH. You can execute lima-memtester as root without any problems even if X11 is running. And I found it also somewhat convenient having potential error messages available even if the board crashed (your freezes sound more like a powering problem but it's good to confirm what's really going on later) I'm also very curious about the thermal values you get
Tido Posted June 7, 2016 Posted June 7, 2016 red LED onPower Supply 5,16 Volt (measured on the PCB) here are my results in the picture: 1
tkaiser Posted June 7, 2016 Author Posted June 7, 2016 @Tido: Thx, so either schematics are wrong (claiming 1.2V would be used) or the testpoint we've been told to use by @sinovoip is wrong. Anyway it seems the word 'wrong' is somewhat associated to BPi M2+ and/or @sinovoip
zador.blood.stained Posted June 7, 2016 Posted June 7, 2016 @Tido Where did you connect COM/GND probe of your multimeter? Edit: @tkaiser Testpoint seems to be correct if you compare components connected to it with provided schematics.
Tido Posted June 7, 2016 Posted June 7, 2016 Hi Zador, A good question. I have attached it to the ground from the Power supply. Now I am wondering if there is some better point. What Do you think?
zador.blood.stained Posted June 7, 2016 Posted June 7, 2016 A good question. I have attached it to the ground from the Power supply. I left mine on chassis of ATX power supply (that I'm using to power boards I'm testing) and measured more than 1.5V instead of 1.3 on OPi One. You can try connecting it to one of GPIO GND pins (i.e. pin 39). Also you can measure voltage on (between its leads) tantalum capacitor (big yellow thing in the middle of your photo). Ideally you need to connect your positive probe to VDD_CPUFB signal, but I don't see any testpoint for it, and without resistor numbers on PCB it is almost impossible to find it.
Tido Posted June 7, 2016 Posted June 7, 2016 well, I got my probe (negativ) on the power-jack (just to be clear). Is there a difference in volt between power-jack negativ and GPIO GND pin?
zador.blood.stained Posted June 7, 2016 Posted June 7, 2016 Is there a difference in volt between power-jack negativ and GPIO GND pin? No, it's connected to GND directly. But just to be sure, please measure voltage on the capacitor. If it's 1.3V, then most probably first result is correct too.
Tido Posted June 8, 2016 Posted June 8, 2016 Good morning fellows red LED on Power Supply 5,16 Volt (measured on the PCB, Pin39 to power-barrel) GND attached to Pin 39 as reference Pin 1 = 3,23 V Pin 2 = 5,13 V Capacitor yellow side 0,0 V Capacitor orange side 1,30 V I also measured again the points from the picture = same result as in the picture. I left mine on chassis of ATX power supplyAn ATX power supply delivers several volts, so I think it is very important where you connect ground. Can you also make the reference check like I did above, what are your results?
zador.blood.stained Posted June 8, 2016 Posted June 8, 2016 Capacitor yellow side 0,0 V Capacitor orange side 1,30 V I also measured again the points from the picture = same result as in the picture. So 1.3V it is An ATX power supply delivers several volts, so I think it is very important where you connect ground. Can you also make the reference check like I did above, what are your results? It was an ATX power supply, now it has banana plug connectors to provide 5V and 12V for different purposes. I tested my multimeter on REF01CPZ voltage reference (10V), it displayed 10.01V.
tkaiser Posted June 8, 2016 Author Posted June 8, 2016 So 1.3V it is Tried to reflect that in linux-sunxi wiki and linked to the thread where @sinovoip might (not) explain what happened: http://linux-sunxi.org/index.php?title=Sinovoip_Banana_Pi_M2%2B&curid=2677&diff=17540&oldid=17501 Anyway: Always being fed with 1.3V does only partially explain the horrible overheating experience we make with this board. Maybe it's really DRAM, let's see how NanoPi M1 that also uses DDR3 instead of DDR3L DRAM behaves.
vlad59 Posted June 8, 2016 Posted June 8, 2016 So I retried today. All tests were made with a modified ATX board so the 5V should be clean (I didn't have my multimeter at hand to be totally sure but it really should) without any usb devices . The ambient temperature is between 24 and 26°C (so quite hot) I made two first tests with 648MHz and two tests with 624MHz Both tests failed within 15 minutes. What's interesting is that the soc temp goes above 100°C (103 or 104°C). Does the kernel has a failsafe depending on temperature ? could it be the cause ? lima_memtester is way more stressing that the cpuburn test I made following tkaiser's instructions some weeks ago. Another interesting thing that bugs me is that there never is more that 1 core killed. I remember lurking on a discussion about that a few days ago, maybe we need to more agressive (at least on this board). So far my plan is : * Try to tweak the cooling table to kill 3 cores when cooling state = 5 (other adjustments could be made after) * Add a fan I'm really not a fan of my last proposal as that would mean putting both my nanopi in the trash .
tkaiser Posted June 8, 2016 Author Posted June 8, 2016 What's interesting is that the soc temp goes above 100°C (103 or 104°C). Does the kernel has a failsafe depending on temperature ? Yes, currently that's configured to initiate an emergency shutdown at 105°C: https://github.com/igorpecovnik/lib/blob/master/config/fex/nanopim1.fex#L251-L288 But I totally agree: The settings I came up with are insufficient (Zador already pointed that out to me but we looked solely at Oranges and Bananas where this eventually worked and weren't aware that NanoPi M1 obviously is also hot stuff). As a first test it would be great if you could adopt the BPi M2+ settings (just replace the highlighted lines with the stuff from BPi M2+ fex file and allow downclocking to 240 MHz again in /etc/default/cpufrequtils). Please run the test again (starting with 624 MHz) and report back.
Recommended Posts