Jump to content

DEV (5.0.y) on PineA64+ seeing occasional CPU stalls


Recommended Posts

I have been running a "stock" DEV kernel (5.0.y) on my PineA64+ while trying to wait for the previously noted (by others) date-setting issue to occur.  "stock" being a 'compile.sh' using 'BRANCH="dev" using the PineA64 defconfig (BOARD="pine64" with no config changes)

 

A date/time setting problem just happened a little while ago and as I was about to start to try to figure out where the kernel went wrong when I got a process crash with an apparent CPU stalled error:

[154619.384626] rcu: INFO: rcu_sched self-detected stall on CPU
[154619.390319] rcu: 	2-...0: (1 GPs behind) idle=78a/1/0x4000000000000002 softirq=1062220/1062259 fqs=1174 
[154619.399880] rcu: 	 (t=266523 jiffies g=2266349 q=1232)
[154619.405109] Task dump for CPU 2:
[154619.408421] systemd         R  running task        0     1      0 0x00000002
[154619.415553] Call trace:
[154619.418099]  dump_backtrace+0x0/0x1b0
[154619.421849]  show_stack+0x14/0x20
[154619.425252]  sched_show_task+0x154/0x188
[154619.429259]  dump_cpu_task+0x40/0x50
[154619.432919]  rcu_dump_cpu_stacks+0xc4/0x104
[154619.437186]  rcu_check_callbacks+0x694/0x760
[154619.441539]  update_process_times+0x2c/0x58
[154619.445807]  tick_sched_handle.isra.5+0x30/0x48
[154619.450420]  tick_sched_timer+0x48/0x98
[154619.454339]  __hrtimer_run_queues+0xe4/0x1f0
[154619.458695]  hrtimer_interrupt+0xf4/0x2b0
[154619.462793]  arch_timer_handler_phys+0x28/0x40
[154619.467322]  handle_percpu_devid_irq+0x80/0x138
[154619.471936]  generic_handle_irq+0x24/0x38
[154619.476029]  __handle_domain_irq+0x5c/0xb0
[154619.480208]  gic_handle_irq+0x58/0xa8
[154619.483954]  el0_irq_naked+0x4c/0x54

 

I've been seeing these occasionally for random processes, but mostly with 'swapper'.  This one is particularly problematic as 'systemd' itself has crashed.  Without systemd, there is no equivalent 'init', and restarting a crashed systemd isn't always the best idea. . .but that's a different problem.

 

I'm going to leave the above aside for now but I'll look to see if I can get the entire stack dump gets logged persistently (syslog to persistent storage) to look for any patterns, just in case anyone else is seeing these.  Enabling RCU tracing may be helpful but that'll be for another time.  The actual 'rcu' messages get logged currently, but not the call trace (something else to look into)

 

In the meantime I'll try to dig into the date/time setting issue which I'll document in a separate thread once I have more. 

 

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines