0
windysea

DEV (5.0.y) on PineA64+ seeing occasional CPU stalls

Recommended Posts

I have been running a "stock" DEV kernel (5.0.y) on my PineA64+ while trying to wait for the previously noted (by others) date-setting issue to occur.  "stock" being a 'compile.sh' using 'BRANCH="dev" using the PineA64 defconfig (BOARD="pine64" with no config changes)

 

A date/time setting problem just happened a little while ago and as I was about to start to try to figure out where the kernel went wrong when I got a process crash with an apparent CPU stalled error:

[154619.384626] rcu: INFO: rcu_sched self-detected stall on CPU
[154619.390319] rcu: 	2-...0: (1 GPs behind) idle=78a/1/0x4000000000000002 softirq=1062220/1062259 fqs=1174 
[154619.399880] rcu: 	 (t=266523 jiffies g=2266349 q=1232)
[154619.405109] Task dump for CPU 2:
[154619.408421] systemd         R  running task        0     1      0 0x00000002
[154619.415553] Call trace:
[154619.418099]  dump_backtrace+0x0/0x1b0
[154619.421849]  show_stack+0x14/0x20
[154619.425252]  sched_show_task+0x154/0x188
[154619.429259]  dump_cpu_task+0x40/0x50
[154619.432919]  rcu_dump_cpu_stacks+0xc4/0x104
[154619.437186]  rcu_check_callbacks+0x694/0x760
[154619.441539]  update_process_times+0x2c/0x58
[154619.445807]  tick_sched_handle.isra.5+0x30/0x48
[154619.450420]  tick_sched_timer+0x48/0x98
[154619.454339]  __hrtimer_run_queues+0xe4/0x1f0
[154619.458695]  hrtimer_interrupt+0xf4/0x2b0
[154619.462793]  arch_timer_handler_phys+0x28/0x40
[154619.467322]  handle_percpu_devid_irq+0x80/0x138
[154619.471936]  generic_handle_irq+0x24/0x38
[154619.476029]  __handle_domain_irq+0x5c/0xb0
[154619.480208]  gic_handle_irq+0x58/0xa8
[154619.483954]  el0_irq_naked+0x4c/0x54

 

I've been seeing these occasionally for random processes, but mostly with 'swapper'.  This one is particularly problematic as 'systemd' itself has crashed.  Without systemd, there is no equivalent 'init', and restarting a crashed systemd isn't always the best idea. . .but that's a different problem.

 

I'm going to leave the above aside for now but I'll look to see if I can get the entire stack dump gets logged persistently (syslog to persistent storage) to look for any patterns, just in case anyone else is seeing these.  Enabling RCU tracing may be helpful but that'll be for another time.  The actual 'rcu' messages get logged currently, but not the call trace (something else to look into)

 

In the meantime I'll try to dig into the date/time setting issue which I'll document in a separate thread once I have more. 

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
0