DEV (5.0.y) on PineA64+ seeing occasional CPU stalls

Recommended Posts

I have been running a "stock" DEV kernel (5.0.y) on my PineA64+ while trying to wait for the previously noted (by others) date-setting issue to occur.  "stock" being a 'compile.sh' using 'BRANCH="dev" using the PineA64 defconfig (BOARD="pine64" with no config changes)


A date/time setting problem just happened a little while ago and as I was about to start to try to figure out where the kernel went wrong when I got a process crash with an apparent CPU stalled error:

[154619.384626] rcu: INFO: rcu_sched self-detected stall on CPU
[154619.390319] rcu: 	2-...0: (1 GPs behind) idle=78a/1/0x4000000000000002 softirq=1062220/1062259 fqs=1174 
[154619.399880] rcu: 	 (t=266523 jiffies g=2266349 q=1232)
[154619.405109] Task dump for CPU 2:
[154619.408421] systemd         R  running task        0     1      0 0x00000002
[154619.415553] Call trace:
[154619.418099]  dump_backtrace+0x0/0x1b0
[154619.421849]  show_stack+0x14/0x20
[154619.425252]  sched_show_task+0x154/0x188
[154619.429259]  dump_cpu_task+0x40/0x50
[154619.432919]  rcu_dump_cpu_stacks+0xc4/0x104
[154619.437186]  rcu_check_callbacks+0x694/0x760
[154619.441539]  update_process_times+0x2c/0x58
[154619.445807]  tick_sched_handle.isra.5+0x30/0x48
[154619.450420]  tick_sched_timer+0x48/0x98
[154619.454339]  __hrtimer_run_queues+0xe4/0x1f0
[154619.458695]  hrtimer_interrupt+0xf4/0x2b0
[154619.462793]  arch_timer_handler_phys+0x28/0x40
[154619.467322]  handle_percpu_devid_irq+0x80/0x138
[154619.471936]  generic_handle_irq+0x24/0x38
[154619.476029]  __handle_domain_irq+0x5c/0xb0
[154619.480208]  gic_handle_irq+0x58/0xa8
[154619.483954]  el0_irq_naked+0x4c/0x54


I've been seeing these occasionally for random processes, but mostly with 'swapper'.  This one is particularly problematic as 'systemd' itself has crashed.  Without systemd, there is no equivalent 'init', and restarting a crashed systemd isn't always the best idea. . .but that's a different problem.


I'm going to leave the above aside for now but I'll look to see if I can get the entire stack dump gets logged persistently (syslog to persistent storage) to look for any patterns, just in case anyone else is seeing these.  Enabling RCU tracing may be helpful but that'll be for another time.  The actual 'rcu' messages get logged currently, but not the call trace (something else to look into)


In the meantime I'll try to dig into the date/time setting issue which I'll document in a separate thread once I have more. 


Share this post

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.