Jump to content

Helios64 edge kernel: rcu: rcu_preempt kthread starved for 5234524 jiffies (and unresponsive)


Recommended Posts

Posted

I found my Helis64 unresponsive after about a week or so. Even heartbeat LED is not blinking (permanently on). I see this repeating on the serial console

[495778.879711] rcu: rcu_preempt kthread timer wakeup didn't happen for 5324533 jiffies! g9706925 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200
[495778.880747] rcu: 	Possible timer handling issue on cpu=3 timer-softirq=2513452
[495778.881383] rcu: rcu_preempt kthread starved for 5324534 jiffies! g9706925 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200 ->cpu=3
[495778.882336] rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[495778.883137] rcu: RCU grace-period kthread stack dump:
[495778.883584] task:rcu_preempt     state:R stack:0     pid:16    tgid:16    ppid:2      flags:0x00000008
[495778.884404] Call trace:
[495778.884627]  __switch_to+0xe0/0x124
[495778.884946]  __schedule+0x308/0xa8c
[495778.885263]  schedule+0x34/0xf8
[495778.885549]  schedule_timeout+0x98/0x1bc
[495778.885902]  rcu_gp_fqs_loop+0x150/0x670
[495778.886256]  rcu_gp_kthread+0x234/0x274
[495778.886603]  kthread+0x114/0x118
[495778.886896]  ret_from_fork+0x10/0x20
[495778.887219] rcu: Stack dump where RCU GP kthread last ran:
[495778.887705] Sending NMI from CPU 4 to CPUs 3:

Aside from usual NFS server (unused at the time) and syncthing, it was running duplicity backup at the time. I cannot rule out that it ran out of memory. Though it did run full backup successfully a couple of days ago.

helios64-rcu-stall.txt

Posted

I guess this is one core not responding anymore, likely CPU 5 (one of the big cores). Which kernel do you run?

Is this the first time you encounter this bug?

 

You might want to run ebin-dev dtb (there are voltage hacks for the big CPUs in it).

Posted

Edge from the deb package (6.8.11-edge-rockchip64).

It was the first time; this Saturday I got a similar situation, _but_ I was able to ssh (after minutes of waiting) and save syslog. The first anomaly in the log was (this time):

Sep 14 22:11:52 kobol kernel: BUG: Bad page state in process kcompactd0  pfn:1e320
Sep 14 22:11:52 kobol kernel: page:000000001709b832 refcount:0 mapcount:0 mapping:000000004953ae39 index:0x4c1a1c30 pfn:0x1e320
Sep 14 22:11:52 kobol kernel: aops:0xffff800081149ed8 ino:1
Sep 14 22:11:52 kobol kernel: flags: 0xffff1800000020c(referenced|uptodate|workingset|node=0|zone=0|lastcpupid=0xffff)
Sep 14 22:11:52 kobol kernel: page_type: 0xffffffff()
Sep 14 22:11:52 kobol kernel: raw: 0ffff1800000020c dead000000000100 dead000000000122 ffff0000009e8338
Sep 14 22:11:52 kobol kernel: raw: 000000004c1a1c30 0000000000000000 00000000ffffffff 0000000000000000
Sep 14 22:11:52 kobol kernel: page dumped because: non-NULL mapping

and later there are repeated "rcu: INFO: rcu_preempt detected stalls on CPUs/tasks ..." (see attached file).

I will try that other dtb, thanks!

rcu-stall.txt

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines