Mathias Posted March 23, 2020 Posted March 23, 2020 Armbianmonitor: http://ix.io/2f4L Has anybody seen this message at boot: [ 5.352826] CPU4: failed to come online [ 5.352835] CPU4: failed in unknown state : 0x0 [ 10.481328] CPU5: failed to come online [ 10.481336] CPU5: failed in unknown state : 0x0 Then, the A72 cores don't show up (unsurprisingly since these are the two cores that somehow did not come online). On top of that, the kernel throws a backtrace (related to sound if I understand correctly): ------------[ cut here ]------------ [ 13.014782] WARNING: CPU: 3 PID: 1 at kernel/irq/manage.c:1990 request_threaded_irq+0x144/0x180 [ 13.014784] Modules linked in: [ 13.014793] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.4.26-rockchip64 #20.02.5 [ 13.014795] Hardware name: Pine64 RockPro64 (DT) [ 13.014799] pstate: a0000005 (NzCv daif -PAN -UAO) [ 13.014804] pc : request_threaded_irq+0x144/0x180 [ 13.014808] lr : request_threaded_irq+0x6c/0x180 [ 13.014810] sp : ffff80001004b9b0 [ 13.014813] x29: ffff80001004b9b0 x28: 0000000000000000 [ 13.014817] x27: ffff0000ef78c0c0 x26: ffff8000111c8d98 [ 13.014822] x25: 0000000000000000 x24: 0000000000000007 [ 13.014826] x23: ffff0000f0914870 x22: ffff800010b2dce0 [ 13.014830] x21: ffff0000f142a000 x20: 0000000000000000 [ 13.014834] x19: ffff80001141bee0 x18: 0000000000000001 [ 13.014838] x17: ffff800011188d00 x16: ffff800011188d08 [ 13.014843] x15: ffffffffffffffff x14: ffff80001137b508 [ 13.014847] x13: ffff00016f1e14b7 x12: ffff0000ef1e14c3 [ 13.014851] x11: ffff0000f67ac268 x10: 0000000000000040 [ 13.014855] x9 : ffff80001139f028 x8 : ffff80001139f020 [ 13.014859] x7 : ffff0000f10002a8 x6 : 0000000000000000 [ 13.014863] x5 : ffff0000f1000248 x4 : 0000000000000000 [ 13.014867] x3 : 0000000000000000 x2 : 0000000000000000 [ 13.014871] x1 : 0000000000000007 x0 : 0000000000031600 [ 13.014875] Call trace: [ 13.014880] request_threaded_irq+0x144/0x180 [ 13.014887] snd_mtpav_probe+0x15c/0x3d8 [ 13.014893] platform_drv_probe+0x50/0xa0 [ 13.014899] really_probe+0xd8/0x300 [ 13.014902] driver_probe_device+0x54/0xe8 [ 13.014906] __device_attach_driver+0x80/0xb8 [ 13.014910] bus_for_each_drv+0x78/0xc8 [ 13.014915] __device_attach+0xd4/0x130 [ 13.014918] device_initial_probe+0x10/0x18 [ 13.014922] bus_probe_device+0x90/0x98 [ 13.014927] device_add+0x3c4/0x5f0 [ 13.014930] platform_device_add+0x10c/0x230 [ 13.014934] platform_device_register_full+0xc8/0x140 [ 13.014940] alsa_card_mtpav_init+0x74/0xd0 [ 13.014945] do_one_initcall+0x74/0x1b0 [ 13.014950] kernel_init_freeable+0x194/0x22c [ 13.014957] kernel_init+0x10/0xfc [ 13.014961] ret_from_fork+0x10/0x18 [ 13.014969] ---[ end trace 34ce35f0c45c0a90 ]--- Mathias
David Pottage Posted March 25, 2020 Posted March 25, 2020 I have not seen the CPU failed to come online message, but I get the WARNING: CPU: 4 PID: 1 at kernel/irq/manage.c:1990 request_threaded_irq ever time I boot. That leads to https://elixir.bootlin.com/linux/v5.4.26/source/kernel/irq/manage.c#L1990 in the kernel source tree, but I have no idea if the warning is serous or not.
soerenderfor Posted March 26, 2020 Posted March 26, 2020 I have moved the thread to Rockchip 3399 , Thanks.
Mathias Posted March 27, 2020 Author Posted March 27, 2020 I have installed the latest stale kernel from Armbian (5.4.27) and this does exactly the same... I will try to power down the system, leave it off for a few seconds and then restart, just in case...
soerenderfor Posted March 27, 2020 Posted March 27, 2020 I did also have cpu error on 5.4.26. I did downgrade.
Mathias Posted March 27, 2020 Author Posted March 27, 2020 After a cold boot, I don't have cpuerrors anymore (on 5.4.27). I've waited ~30s before restarting the system. I still have a crash but the kernel can recover (see http://ix.io/2fDO): [ 41.902116] ------------[ cut here ]------------ [ 41.902135] WARNING: CPU: 4 PID: 1 at kernel/irq/manage.c:1990 request_threaded_irq+0x144/0x180 [ 41.902138] Modules linked in: [ 41.902149] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.4.27-rockchip64 #20.02.6 [ 41.902153] Hardware name: Pine64 RockPro64 (DT) [ 41.902158] pstate: a0000005 (NzCv daif -PAN -UAO) [ 41.902165] pc : request_threaded_irq+0x144/0x180 [ 41.902171] lr : request_threaded_irq+0x6c/0x180 [ 41.902174] sp : ffff80001004b9b0 [ 41.902178] x29: ffff80001004b9b0 x28: 0000000000000000 [ 41.902185] x27: ffff0000ef2428c0 x26: ffff8000111c8d98 [ 41.902190] x25: 0000000000000000 x24: 0000000000000007 [ 41.902195] x23: ffff0000f0d77870 x22: ffff800010b2dd80 [ 41.902201] x21: ffff0000f142a000 x20: 0000000000000000 [ 41.902206] x19: ffff80001141bee0 x18: 0000000000000001 [ 41.902211] x17: ffff0000f0d75a00 x16: ffff800010aa6aa0 [ 41.902216] x15: ffffffffffffffff x14: ffff80001137b508 [ 41.902222] x13: ffff00016f291b37 x12: ffff0000ef291b43 [ 41.902227] x11: ffff0000f67c2268 x10: 0000000000000040 [ 41.902232] x9 : ffff80001139f028 x8 : ffff80001139f020 [ 41.902238] x7 : ffff0000f10002a8 x6 : 0000000000000000 [ 41.902243] x5 : ffff0000f1000248 x4 : 0000000000000000 [ 41.902248] x3 : 0000000000000000 x2 : 0000000000000000 [ 41.902253] x1 : 0000000000000007 x0 : 0000000000031600 [ 41.902258] Call trace: [ 41.902265] request_threaded_irq+0x144/0x180 [ 41.902274] snd_mtpav_probe+0x15c/0x3d8 [ 41.902281] platform_drv_probe+0x50/0xa0 [ 41.902288] really_probe+0xd8/0x300 [ 41.902293] driver_probe_device+0x54/0xe8 [ 41.902297] __device_attach_driver+0x80/0xb8 [ 41.902303] bus_for_each_drv+0x78/0xc8 [ 41.902309] __device_attach+0xd4/0x130 [ 41.902313] device_initial_probe+0x10/0x18 [ 41.902319] bus_probe_device+0x90/0x98 [ 41.902324] device_add+0x3c4/0x5f0 [ 41.902329] platform_device_add+0x10c/0x230 [ 41.902334] platform_device_register_full+0xc8/0x140 [ 41.902341] alsa_card_mtpav_init+0x74/0xd0 [ 41.902348] do_one_initcall+0x74/0x1b0 [ 41.902354] kernel_init_freeable+0x194/0x22c [ 41.902361] kernel_init+0x10/0xfc [ 41.902367] ret_from_fork+0x10/0x18 [ 41.902374] ---[ end trace f53d3c1ec0afdd56 ]--- Mathias
soerenderfor Posted March 27, 2020 Posted March 27, 2020 I did have some probs with my sata card on 5.4.26, how is it working for you?
Mathias Posted March 30, 2020 Author Posted March 30, 2020 For me, it seems to be working perfectly fine. I've just transferred more than 250G of backups back on my sata drive, no issues.
soerenderfor Posted April 1, 2020 Posted April 1, 2020 On 3/30/2020 at 3:07 PM, Mathias said: For me, it seems to be working perfectly fine. I've just transferred more than 250G of backups back on my sata drive, no issues. Thanks.
Myy Posted April 1, 2020 Posted April 1, 2020 From what I can read from the backtrace, the issue appears while invoking the 'probe' function of the snd-mtpav driver so... If you still have the same issue, could you try to blacklist snd-mtpav ( add "blacklist snd-mtpav" in /etc/modprobe.d/blacklist ). Of course, you might have no sound during the test.
Mathias Posted April 16, 2020 Author Posted April 16, 2020 I've tried to blacklist snd-mtpav, and besides not having the module loaded (lsmod shows that the module is not loaded), I still have the crash at boot... It does not makes sense to me...
Myy Posted April 17, 2020 Posted April 17, 2020 Yeah, the causes of the CPU not coming online and the cause of the WARNING message are completely different. It seems that CPU 4 and 5 belong to another part of the system : [ 0.000000] GICv3: GIC: PPI partition interrupt-partition-0[0] { /cpus/cpu@0[0] /cpus/cpu@1[1] /cpus/cpu@2[2] /cpus/cpu@3[3] } [ 0.000000] GICv3: GIC: PPI partition interrupt-partition-1[1] { /cpus/cpu@100[4] /cpus/cpu@101[5] } So I *guess* that something is happening during the initialization of interrupt-partition-1[1] bringing both CPU down the drain. However, your first logs seem to suggest that 5.5 kernel were able to put the CPU online... Hmm... Does the problem happen on every boot ? Only after a cold boot ? Only after a reboot ? If that happens from times to times, that might be a timing issue... But I don't really see any patch to the GICv3 driver that could have fixed this issue directly : https://github.com/torvalds/linux/commits/master/drivers/irqchip/irq-gic-v3.c They only fixed a few issues that mattered to Cavium boards, when it comes to ARM64 specific patches.
Recommended Posts