chrisf Posted November 15, 2016 Posted November 15, 2016 After flashing Armbian 5.20 to my SD card, running apt-get upgrade, it fails to boot with the new kernel. The last thing I get from the console is Image lacks image_size field, assuming 16MiB ## Loading init Ramdisk from Legacy Image at 45300000 ... Image Name: uInitrd Image Type: ARM Linux RAMDisk Image (gzip compressed) Data Size: 3845164 Bytes = 3.7 MiB Load Address: 00000000 Entry Point: 00000000 Verifying Checksum ... OK ## Flattened Device Tree blob at 45000000 Booting using the fdt blob at 0x45000000 reserving fdt memory region: addr=45000000 size=200000 reserving fdt memory region: addr=41010000 size=10000 reserving fdt memory region: addr=41020000 size=800 reserving fdt memory region: addr=40100000 size=4000 reserving fdt memory region: addr=40104000 size=1000 reserving fdt memory region: addr=40105000 size=1000 Loading Ramdisk to b6b0f000, end b6eb9c2c ... OK Loading Device Tree to 44fec000, end 44fffddb ... OK Starting kernel ... [mmc]: MMC Device 2 not found [mmc]: mmc 2 not find, so not exit INFO: BL3-1: Next image address = 0x41080000 INFO: BL3-1: Next image spsr = 0x3c9 Loading, please wait... Begin: Loading essential drivers ... done. Begin: Running /scripts/init-premount ... done. Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done. Begin: Running /scripts/local-premount ... Followed soon after by Begin: Running /scripts/local-premount ... [ 36.407826] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:0:4] [ 36.418567] Call trace: [ 36.421371] Call trace: [ 36.424151] Call trace: [ 36.426959] Call trace: [ 36.430509] Call trace: [ 36.433307] Call trace: [ 36.436100] Call trace: [ 36.438892] Call trace: [ 36.441685] Call trace: [ 36.444480] Call trace: [ 36.447285] Call trace: [ 36.450078] Call trace: [ 36.452869] Call trace: [ 36.455662] Call trace: [ 36.458452] Call trace: [ 36.461243] Call trace: [ 36.464035] Call trace: [ 36.466830] Call trace: [ 36.469624] Call trace: [ 36.472416] Call trace: [ 36.475208] Call trace: [ 36.477999] Call trace: [ 36.480790] Call trace: [ 36.483585] Call trace: [ 36.486380] Call trace: [ 36.489171] Call trace: [ 36.491960] Call trace: [ 36.494750] Call trace: [ 36.497539] Call trace: [ 36.500333] Call trace: [ 36.503128] Call trace: [ 36.505923] Call trace: [ 36.508717] Call trace: [ 36.511516] Call trace: [ 36.514311] Call trace: [ 36.517107] Call trace: [ 36.519900] Call trace: [ 36.522694] Call trace: [ 36.525490] Call trace: [ 36.528287] Call trace: [ 36.531083] Call trace: [ 36.533879] Call trace: [ 36.536676] Call trace: [ 36.539472] Call trace: [ 36.542264] Call trace: [ 36.545060] Call trace: [ 36.547856] Call trace: [ 36.550649] Call trace: [ 36.553753] Call trace: [ 36.556552] Call trace: [ 36.559350] Call trace: [ 36.562141] Call trace: [ 36.564964] Call trace: [ 36.567759] Call trace: [ 36.570553] Call trace: [ 36.573347] Call trace: [ 36.576148] Call trace: [ 36.578945] Call trace: [ 36.581736] Call trace: [ 36.584527] Call trace: [ 36.587320] Call trace: [ 36.590117] Call trace: [ 36.592909] Call trace: [ 36.595702] Call trace: [ 36.598492] Call trace: [ 36.601284] Call trace: [ 36.604076] Call trace: [ 36.606873] Call trace: [ 36.609667] Call trace: [ 36.612494] Call trace: [ 36.615286] Call trace: [ 36.618081] Call trace: [ 36.620907] Call trace: [ 36.623700] Call trace: [ 36.626492] Call trace: [ 63.959820] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:0:4] [ 63.970537] Call trace: It then continually repeats.... If I flash the old image back it boots fine, then after apt-get upgrade again it fails. Google found this https://www.bountysource.com/issues/38404155-pine64-boot-issue-cpu-stuckbut it doesn't have an answer yet
zador.blood.stained Posted November 15, 2016 Posted November 15, 2016 Google found this https://www.bountysource.com/issues/38404155-pine64-boot-issue-cpu-stuckbut it doesn't have an answer yet This is a strange aggregator of GitHub issues, and it displays this: https://github.com/igorpecovnik/lib/issues/502 So this issue should be fixed in new kernel version (not released yet, but it should be present in beta repository)
chrisf Posted November 15, 2016 Author Posted November 15, 2016 Thanks Replacing the dtb files in /boot with the ones from http://beta.armbian.com/pool/main/l/linux-u-boot-pine64-default/linux-u-boot-pine64_5.24.161114_arm64.debfix it,
zador.blood.stained Posted November 15, 2016 Posted November 15, 2016 Thanks Replacing the dtb files in /boot with the ones from http://beta.armbian.com/pool/main/l/linux-u-boot-pine64-default/linux-u-boot-pine64_5.24.161114_arm64.debfix it, When I experienced this issue, it was random - sometimes it could be reproduced only 1 out of 10 reboots, so I would recommend installing the kernel package too.
linda Posted March 3, 2017 Posted March 3, 2017 I am using Armbian (Server 5.24, Legacy 3.10.104) on the Pine64 and it is a great system. Unfortunately, I also encounter this boot problem, about every third time it fails to boot (repeating "Call trace: ...) Is there anything I can do??? I have already checked the power supply. Can i use other versions of dtb-files oder kernel? Thanks a lot.
dano Posted March 3, 2017 Posted March 3, 2017 I am currently running ARMBIAN 5.25 stable Ubuntu 16.04.2 LTS 3.10.104-pine64 (server) and also had this problem after I applied updates unless I had a monitor attached. To resolve the startup issue, I changed from dynamic dhcp to static. Starts up every time now.
linda Posted March 5, 2017 Posted March 5, 2017 Thanks for your your advice. Unfortunately I've no choice but to use the dhcp client in my production environment. I would appreciate any other ideas.
dano Posted March 6, 2017 Posted March 6, 2017 Well, thinks have improved. I installed the latest Armbian Debian Jessie Server release and after applying all of the updates, I ended up with a more stable system. I did still have some startup issues, so I found that changing the disp_mode parm from 720p60 to 480p or 480i in /boot/armbianEnv.txt fixes my startup issues. I left the default setting for dhcp in /etc/networking/interfaces. I also have another system running Armbian Ubuntu Server with Mate installed. This was also having startup issues without a static IP set. Now, after the recent updates which I applied today, this system works with dhcp and the disp_mode parm of 720p60 or 1080p60. These are running on a Pine64+ 2Gb.
Martin_Borghoff Posted March 15, 2017 Posted March 15, 2017 Ok good news that it runs stable now. I receive my pine64 boards today. Will test both distributions.Verstuurd vanaf mijn SM-G935F met Tapatalk
andreipoe Posted March 19, 2017 Posted March 19, 2017 On 3/6/2017 at 7:04 AM, dano said: Well, thinks have improved. I installed the latest Armbian Debian Jessie Server release and after applying all of the updates, I ended up with a more stable system. I did still have some startup issues, so I found that changing the disp_mode parm from 720p60 to 480p or 480i in /boot/armbianEnv.txt fixes my startup issues. I left the default setting for dhcp in /etc/networking/interfaces. I also have another system running Armbian Ubuntu Server with Mate installed. This was also having startup issues without a static IP set. Now, after the recent updates which I applied today, this system works with dhcp and the disp_mode parm of 720p60 or 1080p60. These are running on a Pine64+ 2Gb. I wanted to say thanks for the suggestion, it's helped me too. I started my Pine64+ 2GB today for the first time, and after failing to get openSUSE running, I tried Armbian. I too was having issues with the board not booting sometimes, but it seems that setting the display resolution (which I never use anyway) to 480i helped. I've done a few reboots since I made the change and none of the got stuck. Thank you!
linda Posted April 12, 2017 Posted April 12, 2017 Unfortunately the Pine64 continues failing to boot with the actual armbian version.Finally I did a few tests concerning the boot problem, using different SD cards. I used the current Ubuntu server version (Armbian_5.25_Pine64_Ubuntu_xenial_default_3.10.104), upgrade 11.04.17). With only Ethernet and Power connected (no HDMI, no USB device, ...). Power was supplied through Pin Headers (not micro USB) Ethernet used DHCP. I used three different Pine64+ boards with 2GB of RAM for my tests. Results:The boot problem - Pine64 does not boot completely - continued to occur at different frequency depending on the SD-card used: - With the 4GB cards the Pine64 booted only in 46% of the cases (90 tests).- With the 8GB cards the Pine64 booted in 90% of the cases (60 tests). - With the 32GB cards the Pine64 booted in 50% of the cases (60 tests). The boot console shows the following behavior - similar to the post of christf -: Begin: Loading essential drivers ... done. Begin: Running /scripts/init-premount ... done. Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done. Begin: Running /scripts/local-premount ... done. Begin: Will now check root file system ... fsck from util-linux 2.27.1 [ 14.723973] [DISP] disp_device_attached_and_enable,line:159:[/sbin/fsck.ext4 (1) -- /dev/mmcblk0p1] fsck.ext4 -a -C0 /dev/mmcblk0p1 /dev/mmcblk0p1: clean, 47357/216832 files, 366370/915968 blocks done. attched ok, mgr0<-->device1, type=4, mode=5 And than it then continually repeats.... [ 40.681832] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:1:35] [ 40.693596] Modules linked in: [ 40.701616] [ 40.707811] CPU: 0 PID: 35 Comm: kworker/0:1 Not tainted 3.10.104-pine64 #2 [ 40.720179] Workqueue: events start_work [ 40.729239] task: ffffffc078b1a4c0 ti: ffffffc078b74000 task.ti: ffffffc078b74000 [ 40.742218] PC is at __do_softirq+0xb4/0x2d8 [ 40.751573] LR is at __do_softirq+0x30/0x2d8 [ 40.760848] pc : [<ffffffc0000b5fc4>] lr : [<ffffffc0000b5f40>] pstate: 40000145 .... I hope anyone can give me an idea on how to solve the problem. Thanks a lot. BOOTFail.log
zador.blood.stained Posted April 12, 2017 Posted April 12, 2017 Please try installing the beta kernel (linux-image-pine64*.deb) from here: http://beta.armbian.com/pool/main/l/linux-upstream/ It should contain a backported fix for the arch timer bug in A64, and I see a lockup in the arch timer IRQ handler (so definitely related).
linda Posted April 13, 2017 Posted April 13, 2017 Thanks a lot for your support - I have installed the kernel headers (apt install linux-headers-pine64_5.27.170414_arm64.deb) and unfortunatelythere is not much improvement. After the first tests 20x boot test with 4 GB card: 50% boot success, instead of 46% with the old headers. InstallKernel.txt
zador.blood.stained Posted April 13, 2017 Posted April 13, 2017 8 minutes ago, linda said: Thanks a lot for your support - I have installed the kernel headers (apt install linux-headers-pine64_5.27.170414_arm64.deb) I asked to install the linux-image package, not the linux-headers.
linda Posted April 13, 2017 Posted April 13, 2017 Sorry, I've picked up the wrong package. Now I have installed the linux-image package and carried out first boot-tests with a 4GB-SD-card. And the result was great, 90% success, my pine64 started up almost every time. I will continue the tests in the next days.Thank you, you have done me a big favor.
zador.blood.stained Posted April 13, 2017 Posted April 13, 2017 15 minutes ago, linda said: And the result was great, 90% success, my pine64 started up almost every time. I will continue the tests in the next days. Please provide a lockup log with this new kernel too, maybe it will help improving the reliability futher.
linda Posted April 15, 2017 Posted April 15, 2017 I have now done another 120 boot attempts with the 4GB SD cards and achieved the following results: - The boot-success was 80%. - The lockup files show similar results as before, but are slightly different (see appendix): For example One boot failure showed the following behavior: [ 14.834515] Freeing unused kernel memory: 524K (ffffffc0009fc000 - ffffffc000a7f000) Loading, please wait... starting version 229 [ 15.262882] [DISP] disp_device_attached_and_enable,line:159:attched ok, mgr0<-->device1, type=4, mode=5 Begin: Loading essential drivers ... done. Begin: Running /scripts/init-premount ... done. Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done. Begin: Running /scripts/local-premount ... done. Begin: Will now check root file system ... fsck from util-linux 2.27.1 [/sbin/fsck.ext4 (1) -- /dev/mmcblk0p1] fsck.ext4 -a -C0 /dev/mmcblk0p1 /dev/mmcblk0p1: recovering journal /dev/mmcblk0p1: clean, 47463/216832 files, 368808/915968 blocks done. [ 41.122365] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:1:35] one boot failure started with [ 14.378982] Freeing unused kernel memory: 524K (ffffffc0009fc000 - ffffffc000a7f000) Loading, please wait... starting version 229 [ 36.758677] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:1:35] and another boot failures followed these boot-outputs [ 14.405592] Freeing unused kernel memory: 524K (ffffffc0009fc000 - ffffffc000a7f000) starting version 229 [ 14.487651] hub 3-0:1.0: USB hub found [ 14.496552] hub 3-0:1.0: 1 port detected [ 14.505686] scene_lock_init name=ohci_standby Begin: Loading essential drivers ... done. Begin: Running /scripts/init-premount ... done. Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done. Begin: Running /scripts/local-premount ... done. Begin: Will now check root file system ... fsck from util-linux 2.27.1 [ 14.916161] [DISP] disp_device_attached_and_enable,line:159:attched ok, mgr0<-->device1, type=4, mode=5 [/sbin/fsck.ext4 (1) -- /dev/mmcblk0p1] fsck.ext4 -a -C0 /dev/mmcblk0p1 /dev/mmcblk0p1: recovering journal /dev/mmcblk0p1: clean, 47374/216832 files, 347125/915968 blocks done. [ 40.716572] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:1:35] I hope these lockuo logs help in troubleshooting. Thanks aBOOTFail_2017-04-15-B1.log lot!! BOOTFail_2017-04-15-B2.log BOOTFail_2017-04-15-C1.log
linda Posted April 19, 2017 Posted April 19, 2017 (edited) I did some more tests with more different sd-cards: Another brand 8GB, 4GB and one 16GB card. The results were quite the same as before. About 80% successfully booting. After equipping my pine boards with a reset-switch i did the following test: if the board fails to boot after 50 seconds -> press the reset switch. With the 5.27 image the boards had an rate of successfully booting of 99% after the second reset. Summary 260 attempts to boot (power on) 204 success (78%) 251 success after one reset 96% 258 success after two resets 99% Are there any more test I could make or any information I could provide in order to support you? Will the 5.27 image become available via standard apt-upgrade? If so, when do you think this will be going to happen? Edited April 19, 2017 by linda
zador.blood.stained Posted April 19, 2017 Posted April 19, 2017 1 hour ago, linda said: Are there any more test I could make or any information I could provide in order to support you? I did some more tests and looked more closely at the stack traces. If I understand it correctly it actually hangs somewhere here, most likely in __setup_irq or shortly after it: [ 40.941570] [<ffffffc000083dc0>] el1_irq+0x80/0xe4 [ 40.950287] [<ffffffc000125844>] __setup_irq+0x318/0x3e0 [ 40.959557] [<ffffffc000125a84>] request_threaded_irq+0xe0/0x124 [ 40.969620] [<ffffffc00041280c>] disp_sys_register_irq+0x88/0x98 [ 40.979698] [<ffffffc000420610>] disp_hdmi_enable+0x1d4/0x278 [ 40.989485] [<ffffffc000414540>] disp_device_attached_and_enable+0x1bc/0x1d4 [ 41.000742] [<ffffffc0004146f8>] bsp_disp_device_switch+0xbc/0xe4 In theory completely disabling the display driver should help for headless use cases, but it needs some rework in order to implement this. 1 hour ago, linda said: Will the 5.27 image become available via standard apt-upgrade? Yes 1 hour ago, linda said: If so, when do you think this will be going to happen? 1-2 months 1 hour ago, linda said: Are there any more test I could make or any information I could provide in order to support you? Don't think so, it's not easy to debug, and hopefully mainline kernel will soon be good enough for everyday use (at least headless/server), so we could forget about the BSP kernel.
linda Posted April 19, 2017 Posted April 19, 2017 I would not mind disabling the display driver. I am running headless anyway. Is there any easy way to disable the display driver?
zador.blood.stained Posted April 19, 2017 Posted April 19, 2017 11 minutes ago, linda said: Is there any easy way to disable the display driver? Disabling it in the kernel config is the easiest way. Reading a Device Tree property to allow disabling it without recompilation would be the correct way, I'll try to implement this before the update and test if it actually resolves the issue. 1
stepw Posted June 11, 2017 Posted June 11, 2017 I've been having problems with kernel boot stalling on my headless Pine64+ with Armbian 5.25, occurring 95% of the time in case of unclean shutdown (e.g. power loss). On contrary, it only occurs 5% of the time upon graceful shutdown/reboot, as CPU usually locks up after fsck is finished and fsck is not needed upon clean shutdown. Debug output to console seems to be increasing likelihood of crash substantially. So I've added "extraargs=loglevel=1" to /boot/armbianEnv.txt to override "verbosity=7" which seems to be reinstated upon boot, resulting in kernel booting with loglevel=7 upon unclean shutdown. I'm seeing CPU lock ups as rarely as with graceful shutdown now (e.g. 5%). There seems to be some race condition influenced by kernel boot time (which is marginally higher when console logging is at debugging level). Obviously if other people are seeing high failure rate regardless of kernel console logging level, my findings are not relevant to their issues, but I figured this might help someone...
ZupoLlask Posted January 8, 2018 Posted January 8, 2018 Hi Zador, I'm still observing this behavior with Jessie's based Armbian 5.36, after having to run update-initramfs -u... Before that, I had no problem with reboots. After that, I have to unplug and replug power supply for the system to boot. I've just tried beta's 5.37 and the problem persists... Do you have any hint on the reason this is happening only after updating initramfs? Thanks.
zador.blood.stained Posted January 9, 2018 Posted January 9, 2018 10 hours ago, ZupoLlask said: Do you have any hint on the reason this is happening only after updating initramfs? Because this is most likely a race condition and it may depend on initramfs and kernel size and alignment, CPU speed, threads and kworkers being assigned to different CPU cores and other unpredictable things.
ZupoLlask Posted January 10, 2018 Posted January 10, 2018 I brought a TTL adaptor from my company's lab and now I can confirm I'm having precisely the same problem described in the OP. I also found out this thread, which can be useful for other to better understand this issue: https://github.com/longsleep/build-pine64-image/issues/51 As I doubt I have a problem with the power supply I'm using, for now I decided to test another USB cable. After that, I'll make proper testings of all the components involved if I need it... 1
ZupoLlask Posted January 10, 2018 Posted January 10, 2018 I've just tried to reproduce the problem with a 15cm cable... It still happens easily. In my case, I don't think it's related with power supply but I'll try to clear that out in the lab. If it was, it should happen consistently with several USB devices overloading the power supply with the 4 CPU cores at 100% each, which never occurs. 1
Recommended Posts