Jump to content

Cubietruck freeze after 1-3 days with 5.23 Xenial (uboot problem?)


apollon77

Recommended Posts

How it can be HW related when it works rock solid with one version of u-boot and instable with an other ?!

Because for example it worked and still works on my Cubietruck both with older u-boot (2016.07, 2016.09) and 432MHz DRAM freq and with newer u-boot (2016.11) and 384MHz DRAM freq. And cubietruck is a relatively popular board, so these default and lower than default values seems to work fine on most boards.

Link to comment
Share on other sites

You can't get this information in a software way. Just check the markings on the four DRAM chips on the board. For example, the original https://linux-sunxi.org/Cubietruck used GT8UB512M8EN-BG chips and you can find them on the pictures.

 

It is necessary to recompile U-Boot for this after changing the DRAM clock speed settings in the 'configs/Cubietruck_defconfig' file. You can find some instructions here: https://linux-sunxi.org/Mainline_U-Boot#Compile_U-Boot

 

Ok. Mine has SKhynix SKhynix H5TQ4G63AFR-PBC DRAM chips. The board says CUBIETRUCK V1.0-06.06 printed on it.

 

Link to photo of the board: http://workupload.com/file/hGEwhZh

 

I also managed to compile latest u-boot. As soon as the other unit arrives I'm going make further tests.

Link to comment
Share on other sites

Finally got the second Cubietruck and did some more testing. Here are the results.

"Works stable" means a minimum complete run of lima-memtester and cpufreq-ljt-stress-test (cpuburn-arm).

 

Cubie1 (old, existent):

- SKhynix H5TQ4G63AFR-PBC DRAM chips.

- Works rock solid with armbian u-boot 5.17 and u-boot 5.20 ( U-Boot SPL 2016.09-armbian )

- very unstable (boot errors, LIMA memtester crashes the kernel instantly, see here) with armbian u-boot 5.23 and beta 5.24 ( U-Boot 2016.11 )

- stable with own crosscompiled mainline u-boot  SPL 2016.11-00108-gca39bd8 using unmodified default Cubietruck_defconfig file (CONFIG_DRAM_CLK=432)

 

Cubie2 (new arrival):

- has also SKhynix H5TQ4G63AFR-PBC DRAM chips

- works stable with all u-boot versions, eg. armbian 5.20, 5.23 and beta 5.24.161212 / U-Boot SPL 2016.11-armbian

 

This means, that the problem/behaviour

- isn't related to SKhynix DRAM chips (same chips on both cubie trucks)

- isn't related to DRAM clock speed ( problematic cubie1 runs stable with mainline u-boot at DRAM_CLK 432MHz but crashes with armbian u-boot 5.23 at lower 384 MHz)

Edit 14/02/2017: is most probably related to DRAM clock speed of 384 MHz. I think I own a rare Cubietruck that is stable with 432MHz _but_ unstable at 384MHz.

- is probably related to armbian u-boot customizing for Cubietruck.

Edit 14/02/2017: only in so far that armbian lowered DRAM clock speed for Cubietruck to 384 MHz for cubie u-boot defconfig.

 

Where can I find armbian cubietruck u-boot defconfig file in order to compare it to the mainline default config?

Link to comment
Share on other sites

Keep in mind that different units may have slightly different tolerances and overclocking potential even if it's the same board model. For example, have a look at https://linux-sunxi.org/Orange_Pi_PC#DRAM_clock_speed_limit

Some Orange Pi PC boards pass the lima-memtester test with DRAM clocked at 720 MHz, while the others fail it at 672 MHz. The same is very likely true for the Cubietruck.

 

Also ""Works stable" means a minimum complete run of lima-memtester and cpufreq-ljt-stress-test (cpuburn-arm)" is not good enough. Ideally we want at least a bit of extra safety headroom. Which means that you should stress test the board at slightly higher settings than intended to be used in production.

 

Still if you have some differences depending on the U-Boot version, then we need to investigate this too. Ideally we need to identify the exact problematic commit (via git bisecting).

Link to comment
Share on other sites

OK, lower DRAM clock frequency can explain why stability problems disappeared. Why was it lowered in armbian in the first place? Was it there to address reliability problems? If yes, then why hasn't it been reported to mainline yet?

 

In order to figure out what is going on, I think that we need to have some test results similar to https://linux-sunxi.org/Orange_Pi_PC#DRAM_clock_speed_limit

Done separately for different brands of DRAM chips (GT vs. HYNIX).

 

Edit: Various reports in this discussion thread are very confusing and contradict each other. For example, tpm8 mentioned that the mainline U-Boot was stable with the 432MHz DRAM clock speed, while armbian was not. Please get your act together.

Link to comment
Share on other sites

OK, lower DRAM clock frequency can explain why stability problems disappeared. Why was it lowered in armbian in the first place? Was it there to address reliability problems? If yes, then why hasn't it been reported to mainline yet?

DRAM frequency was lowered exactly to improve reliability, but it was done mostly as a preventive measure on all boards after we received enough complaints about Lime/Lime2 boards. Lime and Lime2 defaults were already lowered in mainline several months ago, and as for other boards like Cubietruck - we can't reproduce the issues on our boards, so we can't confirm that it is a clock frequency issue, some kind of regression in SPL code, difference between different DRAM chips (Wiki says there are at least 2 variants on the Cubietruck) or if this is not related to u-boot and DRAM at all.

 

In order to figure out what is going on, I think that we need to have some test results similar to https://linux-sunxi.org/Orange_Pi_PC#DRAM_clock_speed_limit

Done separately for different brands of DRAM chips (GT vs. HYNIX).

For this we need multiple boards with different DRAM chips, and these boards are relatively expensive for stockpiling them only to address 1 or 2 possible reliability reports.

 

Edit: Various reports in this discussion thread are very confusing and contradict each other. For example, tpm8 mentioned that the mainline U-Boot was stable with the 432MHz DRAM clock speed, while armbian was not. Please get your act together.

Exactly. So it may depend on u-boot version too, so it needs to be bisected on a board where the issue can be reliably reproduced.

Link to comment
Share on other sites

@tpm8: Did I understood you correct that your self compiled uboot is stable? Can you make it available? Then I could test it on my cubietrucks.

 

I was experimenting with governors because I read in a forum that people fixed it by modifying this but it seems to have no effect.

5.20 is still the stablest.

 

Especially on the machine with an InfluxDB on it any higher version then 5.20 crashs after 2-10h in normal operation ... with 5.20 it happends after 2-6weeks (but happends still). But in the end I have this effect on all my 4 cubietrucks I use for my home-automation. Powering is different but all more then enough (2A+) ...

Link to comment
Share on other sites

@apollon77 - mine cross-compliled u-boot was stable (at least it passed several runs of CPU stress tester and LIMA memtester).

 

Backup your SD-Card before installing u-boot!

 

Legacy 3.4 kernel only - You have to add the following line to /boot/boot.cmd:

setenv bootm_boot_mode sec

Legacy 3.4 kernel only - Build new boot script:

mkimage -C none -A arm -T script -d /boot/boot.cmd /boot/boot.scr

Download u-boot from here (as I can't attach a binary to this thread)

u-boot-sunxi-with-spl.zip

 

Copy u-boot binary:

scp -p u-boot-sunxi-with-spl.bin cubietruck:/root/u-boot-sunxi-with-spl_20161122.bin

Write u-boot to boot sector:

root@cubietruck:~# dd if=u-boot-sunxi-with-spl_20161122.bin of=/dev/mmcblk0 bs=1024 seek=8

Sync & reboot:

root@cubietruck:~# sync
root@cubietruck:~# reboot

Serial console should show the following during boot:

U-Boot SPL 2016.11-00108-gca39bd8 (Nov 22 2016 - 12:52:25)
DRAM: 2048 MiB
CPU: 912000000Hz, AXI/AHB/APB: 3/2/2
Trying to boot from MMC1


U-Boot 2016.11-00108-gca39bd8 (Nov 22 2016 - 12:52:25 +0000) Allwinner Technology

CPU:   Allwinner A20 (SUN7I)
Model: Cubietech Cubietruck
I2C:   ready
DRAM:  2 GiB
MMC:   SUNXI SD/MMC: 0
...
Link to comment
Share on other sites

Cool, will try tonight. After booting successfully is there a way to see that the new u-boot was used to boot without looking at the console during boot (because the cubietruck where I can see if it worked with the InfluxDB on it is located in my cellar and no monitor there :-) )

Link to comment
Share on other sites

Cool, will try tonight. After booting successfully is there a way to see that the new u-boot was used to boot without looking at the console during boot (because the cubietruck where I can see if it worked with the InfluxDB on it is located in my cellar and no monitor there :-) )

 

Afraid a monitor in the cellar won't be good enough - there is no easy way to see u-boot boot messages. These are visible on a serial console only ( see http://linux-sunxi.org/Cubieboard/TTL ). dmesg logging only starts after loading the Linux kernel.

Link to comment
Share on other sites

Hm then I will just install and look at the results (if it boots again and afterwards) ... I have backed up my sd card already, so next (after childrens are in bed) I start ... and then I see how stable it is ... >5.20 crashes normally after 1-10h on this machine.

Link to comment
Share on other sites

@tpm8: What kernel you use? I unse vanilla (so "next" here, means kernel 4.9.7). I read that the "setenv bootm_boot_mode sec" is needed for sunxi-3.4 kernel ... so is your file compatible with my system at all?

You are right - I use the legacy 3.4.x kernel. So skip editing boot.cmd with mainline / vanilla kernel - I changed my post above. My system hang during kernel boot without the bootm_boot_mode line.

 

Compiling u-boot is completely independent from the kernel that is booted afterwards. So the compiled u-boot should work for you too.

 

Anyway - fiddling with u-boot can make your system stop booting at all. This is why I mentioned to have an SD card backup before doing so...

Link to comment
Share on other sites

I already tried to use /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor and the other path's to play around and also tried "performance", but no difference. still crashes. Do you had success with solving such crashed that way?

 

I don't know if this file is used at all ?! "schedutil" is currently in there for GOVERNOR, but "ondemand" seems to be used really

Link to comment
Share on other sites

Folks,

 

I did some more testing with armbian 5.25 (jessie legacy) on my "problematic" cubie truck:

 

- again unstable with armbian u-boot 5.25 Package ( crash after few seconds running lima memtester)

- stable with own compiled current mainline u-boot with default cubietruck defconfig (DRAM clock speed: 432 MHz) - completed 35 runs of lima memtester over night without problems

- but also unstable with current mainline u-boot  at DRAM clock speed 384 MHz ( crash after few seconds running lima memtester)

 

So I'm quite sure now, that one of my two cubietrucks is a rare (??) exemplar that is UNSTABLE at 384 MHz - whatever u-boot version.

 

The only difference in the defconfig is slightly lower DRAM frequency in Armbian compared to mainline:

https://github.com/igorpecovnik/lib/blob/master/patch/u-boot/u-boot-sunxi/a10-a20-lower-dram-clk.patch#L89-L99

Thanks - this confirms my finding.

 

 

Keep in mind that different units may have slightly different tolerances and overclocking potential even if it's the same board model. For example, have a look at https://linux-sunxi.org/Orange_Pi_PC#DRAM_clock_speed_limit

Some Orange Pi PC boards pass the lima-memtester test with DRAM clocked at 720 MHz, while the others fail it at 672 MHz. The same is very likely true for the Cubietruck.

 

Again, this Orange Pi Example fits with my observation that a LOWER clock speed also can cause stability problems.

 

OK, lower DRAM clock frequency can explain why stability problems disappeared. Why was it lowered in armbian in the first place? Was it there to address reliability problems? If yes, then why hasn't it been reported to mainline yet?

 

...

 

Edit: Various reports in this discussion thread are very confusing and contradict each other. For example, tpm8 mentioned that the mainline U-Boot was stable with the 432MHz DRAM clock speed, while armbian was not. Please get your act together.

Good question.

 

But I think there is not that much contradiction:

- At least for mine cubietruck the lower DRAM clock speed is apparently bad.

- most probably armbian 5.17 and 5.20 u-boot versions were build _before_ lowering DRAM clock speed in armbian cubietruck defconfig - so they are stable for me (and appollon77 ??)

Link to comment
Share on other sites

I have 4 cubietrucks bought on different timepoints on eBay from different people ... I had such crashes with the current u-boot on each of them!

So it would be very surpsising to name it "rare" :-)) (Or all affected were send to germany)

 

And to count the problem on memory would also make sense because it happens more often on the cubietruck where I use influxdb (highly memory based).

 

I run quite stable with u-boot 5.20 but also here i get freezes after weeks up to 1-2 months. But with all u-boots > 5.20 including 5.25 it crashes in hours or max 1-3days.

 

My testing status is:

I re-tested u-boot 5.25 with "performance" in /etc/default/cpufrequtils again on sunday. With this I had an auto-reboot (watchdog?) after 55mins and a freeze after 26h. So I now finally installed your u-boot and are monitoring :-) No reboot or freeze so far (but also only 19h so far :-) )

I would let it run till the weekend ... and if no freeze till then I could try "the same" build with the lower RAM-speed and could test it it crashes then earlier to verify your assumption

Link to comment
Share on other sites

@tpm8 - Yes, reducing the DRAM clock speed too much can also cause problems if the DRAM chips in question had been designed for very high clock speeds (DDR3-1600). Moreover, the default ZQ settings are very non-optimal for the Cubietruck board and it is possible to clock even to 600 MHz with DRAM chips from GT and tuned ZQ settings, while the defaults have relatively bad reliability - http://lists.denx.de/pipermail/u-boot/2014-July/183981.html

 

@apollon77 - If I understand it correctly, you have never used the lima-memtester stress test, which is specifically designed to quickly identify DRAM problems/misconfigurations. You might want to give it a try, especially considering that you have a nice set of 4 boards.

Link to comment
Share on other sites

@apollon77 - If I understand it correctly, you have never used the lima-memtester stress test, which is specifically designed to quickly identify DRAM problems/misconfigurations. You might want to give it a try, especially considering that you have a nice set of 4 boards.

 

The 4th just arrived some days agi and is not deep integrated in my home automation stuff where all the others are used, so with that new one I re now able to do dome deeper tests (with the others it was no real option without breaking parts of my home automation ;-) )

 

I found http://linux-sunxi.org/Hardware_Reliability_Tests... will try it ...

 

Update: Sorry, I need support ... I use Mainline kernel from Armbian 16.04 (Xenial) default image:

root@cubietruck5:~# ./lima-memtester 100M
This is a simple textured cube demo from the lima driver and
a memtester. Both combined in a single program. The mali400
hardware is only used to stress RAM in the background. But
this happens to significantly increase chances of exposing
memory stability related problems.

Please remove 'sunxi_no_mali_mem_reserve' option from
your kernel command line. Otherwise the mali kernel
driver may be non-functional and actually knock down
your system with some old linux-sunxi kernels.
Aborted

I removed that line and rebuild boot.scr.

 

Reboot, next try:

root@cubietruck5:~# ./lima-memtester 100M
This is a simple textured cube demo from the lima driver and
a memtester. Both combined in a single program. The mali400
hardware is only used to stress RAM in the background. But
this happens to significantly increase chances of exposing
memory stability related problems.

Failed to 'modprobe mali'.
Aborted

root@cubietruck5:~# modprobe mali
modprobe: FATAL: Module mali not found in directory /lib/modules/4.9.7-sunxi

Any idea? Do I need to change more ? Or is it a problem with mainline kernel ?

 

It is a problem that no monitor is connected?!

 

PS: Interestingly on one machine I still use armbian wheezy "legacy" ... There the most current u-boot 5.25 is more stable thenthe 5.20!

Link to comment
Share on other sites

...

I found http://linux-sunxi.org/Hardware_Reliability_Tests... will try it ...

 

Update: Sorry, I need support ... I use Mainline kernel from Armbian 16.04 (Xenial) default image:

 

...

Failed to 'modprobe mali'.
Aborted

root@cubietruck5:~# modprobe mali
modprobe: FATAL: Module mali not found in directory /lib/modules/4.9.7-sunxi

Any idea? Do I need to change more ? Or is it a problem with mainline kernel ?

 

From here: http://linux-sunxi.org/A10_DRAM_Controller_Calibration

 

It is required to have the sunxi-3.4 kernel (specifically for for the mali kernel module). The mainline kernel is not supported yet because it is lacking in the graphics department.

Link to comment
Share on other sites

...

My testing status is:

I re-tested u-boot 5.25 with "performance" in /etc/default/cpufrequtils again on sunday. With this I had an auto-reboot (watchdog?) after 55mins and a freeze after 26h. So I now finally installed your u-boot and are monitoring :-) No reboot or freeze so far (but also only 19h so far :-) )

I would let it run till the weekend ... and if no freeze till then I could try "the same" build with the lower RAM-speed and could test it it crashes then earlier to verify your assumption

 

I use cubietruck with legacy kernel (ARMBIAN 5.25 stable Debian GNU/Linux 8 (jessie) 3.4.113-sun7i). Maybe that's the case.

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines