Jump to content

Rockpro64: CPU failed to come online


Mathias

Recommended Posts

Armbianmonitor:

Has anybody seen this message at boot:

[    5.352826] CPU4: failed to come online
[    5.352835] CPU4: failed in unknown state : 0x0
[   10.481328] CPU5: failed to come online
[   10.481336] CPU5: failed in unknown state : 0x0

 

Then, the A72 cores don't show up (unsurprisingly since these are the two cores that somehow did not come online). On top of that, the kernel throws a backtrace (related to sound if I understand correctly):

------------[ cut here ]------------
[   13.014782] WARNING: CPU: 3 PID: 1 at kernel/irq/manage.c:1990 request_threaded_irq+0x144/0x180
[   13.014784] Modules linked in:
[   13.014793] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.4.26-rockchip64 #20.02.5
[   13.014795] Hardware name: Pine64 RockPro64 (DT)
[   13.014799] pstate: a0000005 (NzCv daif -PAN -UAO)
[   13.014804] pc : request_threaded_irq+0x144/0x180
[   13.014808] lr : request_threaded_irq+0x6c/0x180
[   13.014810] sp : ffff80001004b9b0
[   13.014813] x29: ffff80001004b9b0 x28: 0000000000000000
[   13.014817] x27: ffff0000ef78c0c0 x26: ffff8000111c8d98
[   13.014822] x25: 0000000000000000 x24: 0000000000000007
[   13.014826] x23: ffff0000f0914870 x22: ffff800010b2dce0
[   13.014830] x21: ffff0000f142a000 x20: 0000000000000000
[   13.014834] x19: ffff80001141bee0 x18: 0000000000000001
[   13.014838] x17: ffff800011188d00 x16: ffff800011188d08
[   13.014843] x15: ffffffffffffffff x14: ffff80001137b508
[   13.014847] x13: ffff00016f1e14b7 x12: ffff0000ef1e14c3
[   13.014851] x11: ffff0000f67ac268 x10: 0000000000000040
[   13.014855] x9 : ffff80001139f028 x8 : ffff80001139f020
[   13.014859] x7 : ffff0000f10002a8 x6 : 0000000000000000
[   13.014863] x5 : ffff0000f1000248 x4 : 0000000000000000
[   13.014867] x3 : 0000000000000000 x2 : 0000000000000000
[   13.014871] x1 : 0000000000000007 x0 : 0000000000031600
[   13.014875] Call trace:
[   13.014880]  request_threaded_irq+0x144/0x180
[   13.014887]  snd_mtpav_probe+0x15c/0x3d8
[   13.014893]  platform_drv_probe+0x50/0xa0
[   13.014899]  really_probe+0xd8/0x300
[   13.014902]  driver_probe_device+0x54/0xe8
[   13.014906]  __device_attach_driver+0x80/0xb8
[   13.014910]  bus_for_each_drv+0x78/0xc8
[   13.014915]  __device_attach+0xd4/0x130
[   13.014918]  device_initial_probe+0x10/0x18
[   13.014922]  bus_probe_device+0x90/0x98
[   13.014927]  device_add+0x3c4/0x5f0
[   13.014930]  platform_device_add+0x10c/0x230
[   13.014934]  platform_device_register_full+0xc8/0x140
[   13.014940]  alsa_card_mtpav_init+0x74/0xd0
[   13.014945]  do_one_initcall+0x74/0x1b0
[   13.014950]  kernel_init_freeable+0x194/0x22c
[   13.014957]  kernel_init+0x10/0xfc
[   13.014961]  ret_from_fork+0x10/0x18
[   13.014969] ---[ end trace 34ce35f0c45c0a90 ]---

 

Mathias

Link to comment
Share on other sites

I have installed the latest stale kernel from Armbian (5.4.27) and this does exactly the same... I will try to power down the system, leave it off for a few seconds and then restart, just in case...

Link to comment
Share on other sites

After a cold boot, I don't have cpuerrors anymore (on 5.4.27). I've waited ~30s before restarting the system. I still have a crash but the kernel can recover (see http://ix.io/2fDO):

[   41.902116] ------------[ cut here ]------------
[   41.902135] WARNING: CPU: 4 PID: 1 at kernel/irq/manage.c:1990 request_threaded_irq+0x144/0x180
[   41.902138] Modules linked in:
[   41.902149] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.4.27-rockchip64 #20.02.6
[   41.902153] Hardware name: Pine64 RockPro64 (DT)
[   41.902158] pstate: a0000005 (NzCv daif -PAN -UAO)
[   41.902165] pc : request_threaded_irq+0x144/0x180
[   41.902171] lr : request_threaded_irq+0x6c/0x180
[   41.902174] sp : ffff80001004b9b0
[   41.902178] x29: ffff80001004b9b0 x28: 0000000000000000
[   41.902185] x27: ffff0000ef2428c0 x26: ffff8000111c8d98
[   41.902190] x25: 0000000000000000 x24: 0000000000000007
[   41.902195] x23: ffff0000f0d77870 x22: ffff800010b2dd80
[   41.902201] x21: ffff0000f142a000 x20: 0000000000000000
[   41.902206] x19: ffff80001141bee0 x18: 0000000000000001
[   41.902211] x17: ffff0000f0d75a00 x16: ffff800010aa6aa0
[   41.902216] x15: ffffffffffffffff x14: ffff80001137b508
[   41.902222] x13: ffff00016f291b37 x12: ffff0000ef291b43
[   41.902227] x11: ffff0000f67c2268 x10: 0000000000000040
[   41.902232] x9 : ffff80001139f028 x8 : ffff80001139f020
[   41.902238] x7 : ffff0000f10002a8 x6 : 0000000000000000
[   41.902243] x5 : ffff0000f1000248 x4 : 0000000000000000
[   41.902248] x3 : 0000000000000000 x2 : 0000000000000000
[   41.902253] x1 : 0000000000000007 x0 : 0000000000031600
[   41.902258] Call trace:
[   41.902265]  request_threaded_irq+0x144/0x180
[   41.902274]  snd_mtpav_probe+0x15c/0x3d8
[   41.902281]  platform_drv_probe+0x50/0xa0
[   41.902288]  really_probe+0xd8/0x300
[   41.902293]  driver_probe_device+0x54/0xe8
[   41.902297]  __device_attach_driver+0x80/0xb8
[   41.902303]  bus_for_each_drv+0x78/0xc8
[   41.902309]  __device_attach+0xd4/0x130
[   41.902313]  device_initial_probe+0x10/0x18
[   41.902319]  bus_probe_device+0x90/0x98
[   41.902324]  device_add+0x3c4/0x5f0
[   41.902329]  platform_device_add+0x10c/0x230
[   41.902334]  platform_device_register_full+0xc8/0x140
[   41.902341]  alsa_card_mtpav_init+0x74/0xd0
[   41.902348]  do_one_initcall+0x74/0x1b0
[   41.902354]  kernel_init_freeable+0x194/0x22c
[   41.902361]  kernel_init+0x10/0xfc
[   41.902367]  ret_from_fork+0x10/0x18
[   41.902374] ---[ end trace f53d3c1ec0afdd56 ]---

 

Mathias

Link to comment
Share on other sites

From what I can read from the backtrace, the issue appears while invoking the 'probe' function of the snd-mtpav driver so...

 

If you still have the same issue, could you try to blacklist snd-mtpav ( add "blacklist snd-mtpav" in /etc/modprobe.d/blacklist ). Of course, you might have no sound during the test.

Link to comment
Share on other sites

Yeah, the causes of the CPU not coming online and the cause of the WARNING message are completely different.

 

It seems that CPU 4 and 5 belong to another part of the system :

 

[ 0.000000] GICv3: GIC: PPI partition interrupt-partition-0[0] { /cpus/cpu@0[0] /cpus/cpu@1[1] /cpus/cpu@2[2] /cpus/cpu@3[3] }

 

[ 0.000000] GICv3: GIC: PPI partition interrupt-partition-1[1] { /cpus/cpu@100[4] /cpus/cpu@101[5] }

 

So I *guess* that something is happening during the initialization of interrupt-partition-1[1] bringing both CPU down the drain.

 

However, your first logs seem to suggest that 5.5 kernel were able to put the CPU online... Hmm...

 

Does the problem happen on every boot ? Only after a cold boot ? Only after a reboot ?

If that happens from times to times, that might be a timing issue...

 

But I don't really see any patch to the GICv3 driver that could have fixed this issue directly :

https://github.com/torvalds/linux/commits/master/drivers/irqchip/irq-gic-v3.c

 

They only fixed a few issues that mattered to Cavium boards, when it comes to ARM64 specific patches.

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines