crosser Posted September 1 Share Posted September 1 I found my Helis64 unresponsive after about a week or so. Even heartbeat LED is not blinking (permanently on). I see this repeating on the serial console [495778.879711] rcu: rcu_preempt kthread timer wakeup didn't happen for 5324533 jiffies! g9706925 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200 [495778.880747] rcu: Possible timer handling issue on cpu=3 timer-softirq=2513452 [495778.881383] rcu: rcu_preempt kthread starved for 5324534 jiffies! g9706925 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200 ->cpu=3 [495778.882336] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. [495778.883137] rcu: RCU grace-period kthread stack dump: [495778.883584] task:rcu_preempt state:R stack:0 pid:16 tgid:16 ppid:2 flags:0x00000008 [495778.884404] Call trace: [495778.884627] __switch_to+0xe0/0x124 [495778.884946] __schedule+0x308/0xa8c [495778.885263] schedule+0x34/0xf8 [495778.885549] schedule_timeout+0x98/0x1bc [495778.885902] rcu_gp_fqs_loop+0x150/0x670 [495778.886256] rcu_gp_kthread+0x234/0x274 [495778.886603] kthread+0x114/0x118 [495778.886896] ret_from_fork+0x10/0x20 [495778.887219] rcu: Stack dump where RCU GP kthread last ran: [495778.887705] Sending NMI from CPU 4 to CPUs 3: Aside from usual NFS server (unused at the time) and syncthing, it was running duplicity backup at the time. I cannot rule out that it ran out of memory. Though it did run full backup successfully a couple of days ago. helios64-rcu-stall.txt 0 Quote Link to comment Share on other sites More sharing options...
prahal Posted September 5 Share Posted September 5 I guess this is one core not responding anymore, likely CPU 5 (one of the big cores). Which kernel do you run? Is this the first time you encounter this bug? You might want to run ebin-dev dtb (there are voltage hacks for the big CPUs in it). 0 Quote Link to comment Share on other sites More sharing options...
crosser Posted Sunday at 08:12 PM Author Share Posted Sunday at 08:12 PM Edge from the deb package (6.8.11-edge-rockchip64). It was the first time; this Saturday I got a similar situation, _but_ I was able to ssh (after minutes of waiting) and save syslog. The first anomaly in the log was (this time): Sep 14 22:11:52 kobol kernel: BUG: Bad page state in process kcompactd0 pfn:1e320 Sep 14 22:11:52 kobol kernel: page:000000001709b832 refcount:0 mapcount:0 mapping:000000004953ae39 index:0x4c1a1c30 pfn:0x1e320 Sep 14 22:11:52 kobol kernel: aops:0xffff800081149ed8 ino:1 Sep 14 22:11:52 kobol kernel: flags: 0xffff1800000020c(referenced|uptodate|workingset|node=0|zone=0|lastcpupid=0xffff) Sep 14 22:11:52 kobol kernel: page_type: 0xffffffff() Sep 14 22:11:52 kobol kernel: raw: 0ffff1800000020c dead000000000100 dead000000000122 ffff0000009e8338 Sep 14 22:11:52 kobol kernel: raw: 000000004c1a1c30 0000000000000000 00000000ffffffff 0000000000000000 Sep 14 22:11:52 kobol kernel: page dumped because: non-NULL mapping and later there are repeated "rcu: INFO: rcu_preempt detected stalls on CPUs/tasks ..." (see attached file). I will try that other dtb, thanks! rcu-stall.txt 0 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.