crosser Posted September 1 Posted September 1 I found my Helis64 unresponsive after about a week or so. Even heartbeat LED is not blinking (permanently on). I see this repeating on the serial console [495778.879711] rcu: rcu_preempt kthread timer wakeup didn't happen for 5324533 jiffies! g9706925 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200 [495778.880747] rcu: Possible timer handling issue on cpu=3 timer-softirq=2513452 [495778.881383] rcu: rcu_preempt kthread starved for 5324534 jiffies! g9706925 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200 ->cpu=3 [495778.882336] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. [495778.883137] rcu: RCU grace-period kthread stack dump: [495778.883584] task:rcu_preempt state:R stack:0 pid:16 tgid:16 ppid:2 flags:0x00000008 [495778.884404] Call trace: [495778.884627] __switch_to+0xe0/0x124 [495778.884946] __schedule+0x308/0xa8c [495778.885263] schedule+0x34/0xf8 [495778.885549] schedule_timeout+0x98/0x1bc [495778.885902] rcu_gp_fqs_loop+0x150/0x670 [495778.886256] rcu_gp_kthread+0x234/0x274 [495778.886603] kthread+0x114/0x118 [495778.886896] ret_from_fork+0x10/0x20 [495778.887219] rcu: Stack dump where RCU GP kthread last ran: [495778.887705] Sending NMI from CPU 4 to CPUs 3: Aside from usual NFS server (unused at the time) and syncthing, it was running duplicity backup at the time. I cannot rule out that it ran out of memory. Though it did run full backup successfully a couple of days ago. helios64-rcu-stall.txt 0 Quote
prahal Posted September 5 Posted September 5 I guess this is one core not responding anymore, likely CPU 5 (one of the big cores). Which kernel do you run? Is this the first time you encounter this bug? You might want to run ebin-dev dtb (there are voltage hacks for the big CPUs in it). 0 Quote
crosser Posted September 15 Author Posted September 15 Edge from the deb package (6.8.11-edge-rockchip64). It was the first time; this Saturday I got a similar situation, _but_ I was able to ssh (after minutes of waiting) and save syslog. The first anomaly in the log was (this time): Sep 14 22:11:52 kobol kernel: BUG: Bad page state in process kcompactd0 pfn:1e320 Sep 14 22:11:52 kobol kernel: page:000000001709b832 refcount:0 mapcount:0 mapping:000000004953ae39 index:0x4c1a1c30 pfn:0x1e320 Sep 14 22:11:52 kobol kernel: aops:0xffff800081149ed8 ino:1 Sep 14 22:11:52 kobol kernel: flags: 0xffff1800000020c(referenced|uptodate|workingset|node=0|zone=0|lastcpupid=0xffff) Sep 14 22:11:52 kobol kernel: page_type: 0xffffffff() Sep 14 22:11:52 kobol kernel: raw: 0ffff1800000020c dead000000000100 dead000000000122 ffff0000009e8338 Sep 14 22:11:52 kobol kernel: raw: 000000004c1a1c30 0000000000000000 00000000ffffffff 0000000000000000 Sep 14 22:11:52 kobol kernel: page dumped because: non-NULL mapping and later there are repeated "rcu: INFO: rcu_preempt detected stalls on CPUs/tasks ..." (see attached file). I will try that other dtb, thanks! rcu-stall.txt 0 Quote
Trillien Posted November 21 Posted November 21 Hi, For info I had the same issue running OMV 6.0 with PhotoPrism. It seemed PhotoPrism consumes a lot of CPU to compute the pictures. And at a point (between a quarter and two hours), Helios64 failed with rcu_preempt detected stalls on CPUs error. This error was probably linked to the jump between frequencies : I solved it by limiting the max frequency on biggest cores at 1200 MHz. Note that was before Prahal's DTB to increase cpu voltage. 0 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.