Cubieboard2 another one "rcu_sched detected stalls"


Recommended Posts

Armbianmonitor:

I'am using Cubieboard2. Todays morning i've got mistery "rcu: INFO: rcu_sched detected stalls on CPUs/tasks" with a complete failure of the entire network subsystem. 

How it was:

Jul 13 09:12:36 hassio kernel: rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
Jul 13 09:12:37 hassio kernel: rcu:         1-...0: (21 ticks this GP) idle=a56/1/0x40000000 softirq=4946049/4946050 fqs=2286
Jul 13 09:12:37 hassio kernel:         (detected by 0, t=5252 jiffies, g=11125849, q=117)
Jul 13 09:12:37 hassio kernel: Sending NMI from CPU 0 to CPUs 1:
Jul 13 09:13:13 hassio systemd[1]: systemd-udevd.service: Watchdog timeout (limit 3min)!
Jul 13 09:13:13 hassio systemd[1]: systemd-udevd.service: Killing process 308 (systemd-udevd) with signal SIGABRT.
Jul 13 09:13:39 hassio kernel: rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
Jul 13 09:13:39 hassio kernel: rcu:         1-...0: (21 ticks this GP) idle=a56/1/0x40000000 softirq=4946049/4946050 fqs=8014
Jul 13 09:13:39 hassio kernel:         (detected by 0, t=21007 jiffies, g=11125849, q=2391)
Jul 13 09:13:40 hassio kernel: Sending NMI from CPU 0 to CPUs 1:
Jul 13 09:13:40 hassio kernel: rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 1-... } 6684 jiffies s: 13809 root: 0x2/.
Jul 13 09:13:40 hassio kernel: rcu: blocking rcu_node structures:
Jul 13 09:13:40 hassio kernel: Task dump for CPU 1:
Jul 13 09:13:40 hassio kernel: kworker/1:0     R  running task        0 25335      2 0x00000002
Jul 13 09:13:40 hassio kernel: Workqueue: events dbs_work_handler
Jul 13 09:13:40 hassio kernel: [<c086fd9f>] (__schedule) from [<00000008>] (0x8)
Jul 13 09:14:43 hassio kernel: rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
Jul 13 09:14:43 hassio kernel: rcu:         1-...0: (21 ticks this GP) idle=a56/1/0x40000000 softirq=4946049/4946050 fqs=13734
Jul 13 09:14:43 hassio kernel:         (detected by 0, t=36762 jiffies, g=11125849, q=5496)
Jul 13 09:14:43 hassio kernel: Sending NMI from CPU 0 to CPUs 1:
Jul 13 09:14:42 hassio systemd[1]: snapd.service: Watchdog timeout (limit 5min)!
Jul 13 09:14:42 hassio systemd[1]: snapd.service: Killing process 637 (snapd) with signal SIGABRT.
Jul 13 09:14:43 hassio systemd[1]: systemd-udevd.service: State 'stop-watchdog' timed out. Terminating.

...

Jul 13 09:15:47 hassio kernel: [<c086fd9f>] (__schedule) from [<00000008>] (0x8)
Jul 13 09:15:48 hassio udevd[189]: worker [1659] /devices/virtual/block/loop1 is taking a long time
Jul 13 09:15:48 hassio udevd[493]: worker [567] /devices/virtual/block/loop1 is taking a long time
Jul 13 09:15:49 hassio udevd[182]: worker [31741] /devices/virtual/block/loop1 is taking a long time
Jul 13 09:16:06 hassio kernel: ------------[ cut here ]------------
Jul 13 09:16:06 hassio kernel: WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:448 dev_watchdog+0x213/0x214
Jul 13 09:16:06 hassio kernel: NETDEV WATCHDOG: eth0 (sun7i-dwmac): transmit queue 0 timed out
Jul 13 09:16:06 hassio kernel: Modules linked in: tcp_diag inet_diag xt_nat xt_tcpudp veth xt_conntrack nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter br_netfilter bridge stp llc aufs xt_MASQUERADE iptable_nat >
Jul 13 09:16:06 hassio kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G         C        5.4.45-sunxi #20.05.4
Jul 13 09:16:06 hassio kernel: Hardware name: Allwinner sun7i (A20) Family
Jul 13 09:16:06 hassio kernel: [<c010dc49>] (unwind_backtrace) from [<c010a245>] (show_stack+0x11/0x14)
Jul 13 09:16:06 hassio kernel: [<c010a245>] (show_stack) from [<c085f97b>] (dump_stack+0x6f/0x7c)
Jul 13 09:16:06 hassio kernel: [<c085f97b>] (dump_stack) from [<c011e3cb>] (__warn+0xb7/0xb8)
Jul 13 09:16:06 hassio kernel: [<c011e3cb>] (__warn) from [<c011e65b>] (warn_slowpath_fmt+0x53/0x5c)
Jul 13 09:16:06 hassio kernel: [<c011e65b>] (warn_slowpath_fmt) from [<c076eae3>] (dev_watchdog+0x213/0x214)
Jul 13 09:16:06 hassio kernel: [<c076eae3>] (dev_watchdog) from [<c017299b>] (call_timer_fn+0x27/0x128)
Jul 13 09:16:06 hassio kernel: [<c017299b>] (call_timer_fn) from [<c01733e5>] (run_timer_softirq+0x3f5/0x414)
Jul 13 09:16:06 hassio kernel: [<c01733e5>] (run_timer_softirq) from [<c01022f7>] (__do_softirq+0xdf/0x288)
Jul 13 09:16:06 hassio kernel: [<c01022f7>] (__do_softirq) from [<c0122f4b>] (irq_exit+0x7b/0x90)
Jul 13 09:16:06 hassio kernel: [<c0122f4b>] (irq_exit) from [<c0161523>] (__handle_domain_irq+0x47/0x84)
Jul 13 09:16:06 hassio kernel: [<c0161523>] (__handle_domain_irq) from [<c0517211>] (gic_handle_irq+0x39/0x6c)
Jul 13 09:16:06 hassio kernel: [<c0517211>] (gic_handle_irq) from [<c0101ae5>] (__irq_svc+0x65/0x94)
Jul 13 09:16:06 hassio kernel: Exception stack(0xc0e01f30 to 0xc0e01f78)
Jul 13 09:16:06 hassio kernel: 1f20:                                     00000000 129145fc ef691034 c01164c1
Jul 13 09:16:06 hassio kernel: 1f40: ffffe000 c0e04fa4 c0e04fec 00000001 00000000 c0db99f0 c0ec4b25 00000000
Jul 13 09:16:06 hassio kernel: 1f60: c0f0cf98 c0e01f80 c0107c6f c0107c70 40000033 ffffffff
Jul 13 09:16:06 hassio kernel: [<c0101ae5>] (__irq_svc) from [<c0107c70>] (arch_cpu_idle+0x28/0x2c)
Jul 13 09:16:06 hassio kernel: [<c0107c70>] (arch_cpu_idle) from [<c014126b>] (do_idle+0x143/0x1b0)
Jul 13 09:16:06 hassio kernel: [<c014126b>] (do_idle) from [<c01414b9>] (cpu_startup_entry+0x19/0x20)
Jul 13 09:16:06 hassio kernel: [<c01414b9>] (cpu_startup_entry) from [<c0d00c81>] (start_kernel+0x3eb/0x3f8)
Jul 13 09:16:06 hassio kernel: ---[ end trace 4f6c33f5bb7f39ba ]---
Jul 13 09:16:06 hassio kernel: sun7i-dwmac 1c50000.ethernet eth0: Reset adapter.
Jul 13 09:16:13 hassio systemd[1]: systemd-udevd.service: State 'stop-sigterm' timed out. Killing.
Jul 13 09:16:13 hassio systemd[1]: systemd-udevd.service: Killing process 308 (systemd-udevd) with signal SIGKILL.
Jul 13 09:16:14 hassio systemd[1]: snapd.service: start operation timed out. Terminating.

 

I've read https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt and many similar posts on this forum.

I will try to cut cpu frequency with cpufreq-set .

My question is: How to detect magic string "rcu: INFO: rcu_sched detected stalls on CPUs/tasks" and reboot system instantly? 

 

 

 

Edited by bozzio
Link to post
Share on other sites
Armbian is a community driven open source project. Do you like to contribute your code?

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...