Jump to content

Cubieboard2 another one "rcu_sched detected stalls"


bozzio

Recommended Posts

Armbianmonitor:

I'am using Cubieboard2. Todays morning i've got mistery "rcu: INFO: rcu_sched detected stalls on CPUs/tasks" with a complete failure of the entire network subsystem. 

How it was:

Jul 13 09:12:36 hassio kernel: rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
Jul 13 09:12:37 hassio kernel: rcu:         1-...0: (21 ticks this GP) idle=a56/1/0x40000000 softirq=4946049/4946050 fqs=2286
Jul 13 09:12:37 hassio kernel:         (detected by 0, t=5252 jiffies, g=11125849, q=117)
Jul 13 09:12:37 hassio kernel: Sending NMI from CPU 0 to CPUs 1:
Jul 13 09:13:13 hassio systemd[1]: systemd-udevd.service: Watchdog timeout (limit 3min)!
Jul 13 09:13:13 hassio systemd[1]: systemd-udevd.service: Killing process 308 (systemd-udevd) with signal SIGABRT.
Jul 13 09:13:39 hassio kernel: rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
Jul 13 09:13:39 hassio kernel: rcu:         1-...0: (21 ticks this GP) idle=a56/1/0x40000000 softirq=4946049/4946050 fqs=8014
Jul 13 09:13:39 hassio kernel:         (detected by 0, t=21007 jiffies, g=11125849, q=2391)
Jul 13 09:13:40 hassio kernel: Sending NMI from CPU 0 to CPUs 1:
Jul 13 09:13:40 hassio kernel: rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 1-... } 6684 jiffies s: 13809 root: 0x2/.
Jul 13 09:13:40 hassio kernel: rcu: blocking rcu_node structures:
Jul 13 09:13:40 hassio kernel: Task dump for CPU 1:
Jul 13 09:13:40 hassio kernel: kworker/1:0     R  running task        0 25335      2 0x00000002
Jul 13 09:13:40 hassio kernel: Workqueue: events dbs_work_handler
Jul 13 09:13:40 hassio kernel: [<c086fd9f>] (__schedule) from [<00000008>] (0x8)
Jul 13 09:14:43 hassio kernel: rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
Jul 13 09:14:43 hassio kernel: rcu:         1-...0: (21 ticks this GP) idle=a56/1/0x40000000 softirq=4946049/4946050 fqs=13734
Jul 13 09:14:43 hassio kernel:         (detected by 0, t=36762 jiffies, g=11125849, q=5496)
Jul 13 09:14:43 hassio kernel: Sending NMI from CPU 0 to CPUs 1:
Jul 13 09:14:42 hassio systemd[1]: snapd.service: Watchdog timeout (limit 5min)!
Jul 13 09:14:42 hassio systemd[1]: snapd.service: Killing process 637 (snapd) with signal SIGABRT.
Jul 13 09:14:43 hassio systemd[1]: systemd-udevd.service: State 'stop-watchdog' timed out. Terminating.

...

Jul 13 09:15:47 hassio kernel: [<c086fd9f>] (__schedule) from [<00000008>] (0x8)
Jul 13 09:15:48 hassio udevd[189]: worker [1659] /devices/virtual/block/loop1 is taking a long time
Jul 13 09:15:48 hassio udevd[493]: worker [567] /devices/virtual/block/loop1 is taking a long time
Jul 13 09:15:49 hassio udevd[182]: worker [31741] /devices/virtual/block/loop1 is taking a long time
Jul 13 09:16:06 hassio kernel: ------------[ cut here ]------------
Jul 13 09:16:06 hassio kernel: WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:448 dev_watchdog+0x213/0x214
Jul 13 09:16:06 hassio kernel: NETDEV WATCHDOG: eth0 (sun7i-dwmac): transmit queue 0 timed out
Jul 13 09:16:06 hassio kernel: Modules linked in: tcp_diag inet_diag xt_nat xt_tcpudp veth xt_conntrack nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter br_netfilter bridge stp llc aufs xt_MASQUERADE iptable_nat >
Jul 13 09:16:06 hassio kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G         C        5.4.45-sunxi #20.05.4
Jul 13 09:16:06 hassio kernel: Hardware name: Allwinner sun7i (A20) Family
Jul 13 09:16:06 hassio kernel: [<c010dc49>] (unwind_backtrace) from [<c010a245>] (show_stack+0x11/0x14)
Jul 13 09:16:06 hassio kernel: [<c010a245>] (show_stack) from [<c085f97b>] (dump_stack+0x6f/0x7c)
Jul 13 09:16:06 hassio kernel: [<c085f97b>] (dump_stack) from [<c011e3cb>] (__warn+0xb7/0xb8)
Jul 13 09:16:06 hassio kernel: [<c011e3cb>] (__warn) from [<c011e65b>] (warn_slowpath_fmt+0x53/0x5c)
Jul 13 09:16:06 hassio kernel: [<c011e65b>] (warn_slowpath_fmt) from [<c076eae3>] (dev_watchdog+0x213/0x214)
Jul 13 09:16:06 hassio kernel: [<c076eae3>] (dev_watchdog) from [<c017299b>] (call_timer_fn+0x27/0x128)
Jul 13 09:16:06 hassio kernel: [<c017299b>] (call_timer_fn) from [<c01733e5>] (run_timer_softirq+0x3f5/0x414)
Jul 13 09:16:06 hassio kernel: [<c01733e5>] (run_timer_softirq) from [<c01022f7>] (__do_softirq+0xdf/0x288)
Jul 13 09:16:06 hassio kernel: [<c01022f7>] (__do_softirq) from [<c0122f4b>] (irq_exit+0x7b/0x90)
Jul 13 09:16:06 hassio kernel: [<c0122f4b>] (irq_exit) from [<c0161523>] (__handle_domain_irq+0x47/0x84)
Jul 13 09:16:06 hassio kernel: [<c0161523>] (__handle_domain_irq) from [<c0517211>] (gic_handle_irq+0x39/0x6c)
Jul 13 09:16:06 hassio kernel: [<c0517211>] (gic_handle_irq) from [<c0101ae5>] (__irq_svc+0x65/0x94)
Jul 13 09:16:06 hassio kernel: Exception stack(0xc0e01f30 to 0xc0e01f78)
Jul 13 09:16:06 hassio kernel: 1f20:                                     00000000 129145fc ef691034 c01164c1
Jul 13 09:16:06 hassio kernel: 1f40: ffffe000 c0e04fa4 c0e04fec 00000001 00000000 c0db99f0 c0ec4b25 00000000
Jul 13 09:16:06 hassio kernel: 1f60: c0f0cf98 c0e01f80 c0107c6f c0107c70 40000033 ffffffff
Jul 13 09:16:06 hassio kernel: [<c0101ae5>] (__irq_svc) from [<c0107c70>] (arch_cpu_idle+0x28/0x2c)
Jul 13 09:16:06 hassio kernel: [<c0107c70>] (arch_cpu_idle) from [<c014126b>] (do_idle+0x143/0x1b0)
Jul 13 09:16:06 hassio kernel: [<c014126b>] (do_idle) from [<c01414b9>] (cpu_startup_entry+0x19/0x20)
Jul 13 09:16:06 hassio kernel: [<c01414b9>] (cpu_startup_entry) from [<c0d00c81>] (start_kernel+0x3eb/0x3f8)
Jul 13 09:16:06 hassio kernel: ---[ end trace 4f6c33f5bb7f39ba ]---
Jul 13 09:16:06 hassio kernel: sun7i-dwmac 1c50000.ethernet eth0: Reset adapter.
Jul 13 09:16:13 hassio systemd[1]: systemd-udevd.service: State 'stop-sigterm' timed out. Killing.
Jul 13 09:16:13 hassio systemd[1]: systemd-udevd.service: Killing process 308 (systemd-udevd) with signal SIGKILL.
Jul 13 09:16:14 hassio systemd[1]: snapd.service: start operation timed out. Terminating.

 

I've read https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt and many similar posts on this forum.

I will try to cut cpu frequency with cpufreq-set .

My question is: How to detect magic string "rcu: INFO: rcu_sched detected stalls on CPUs/tasks" and reboot system instantly? 

 

 

 

Edited by bozzio
Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines