Jump to content

Recommended Posts

Posted

I have been experiencing sporadic unresponsiveness from my H3 boards (5 of them). The ping on them still worked but ssh did not. Logs showed that system experienced a time jumping backward somewhere in 1978. Today another such glitch happened but I was able to ssh-in and saw this:

  Reveal hidden contents

 

I got chrony instead of ntpd and it has a line "makestep 1 -1" in it's config allowing infinite number of big time corrections. For some reason it did not helped.

There are relevant topis on the forum for A64:

The only difference is that A64 jumps forward.

I found no related issues on github or forum for H3 specifically.

Does any one else experience this issue?

Posted
  On 8/11/2019 at 5:35 PM, lomady said:

Does any one else experience this issue?

Expand  

The original clock jump issue is really a jump in the future, not in the past ...

Do you have any network/internet issue that prevent having NTP working properly ?

Posted

Yes, I know that the original issue is about jumping in the future.

No, I do not have any network issues.

 

Both ntpd and chrony maintain correct time until it jumps back to 1978. After that

rcu: INFO: rcu_sched self-detected stall on CPU

OOM killer unhinges and starts killing processes left and right

I also saw many cron processes and eMMC was thrashing.

I just want t get to the bottom of these random time jumps. Sometimes it takes several days, sometimes it takes a month of uptime for it to happen.

Posted
  On 8/11/2019 at 5:35 PM, lomady said:

[2736263.499172] rcu: INFO: rcu_sched self-detected stall on CPU
[2736263.499195] rcu:     0-...!: (1 ticks this GP) idle=642/0/0x1 softirq=68967472/68967472 fqs=0 
[2736263.499197] rcu:      (t=249107 jiffies g=147892009 q=729)
[2736263.499208] rcu: rcu_sched kthread starved for 249107 jiffies! g147892009 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0

Expand  

 

This looks similar to the issue in this post:

CPU stall followed by clock getting set to an incorrect time. I think the clock issue is a symptom rather than a cause. I didn't see anything obvious in the call stack of this issue or the referenced issue but I'm not very good at reading those. Maybe someone who knows more can compare these posts and see if a root cause is obvious.

Posted

Hi, 

 

I have encountered this issue for many times on OrangePi PC,  this issue happened by using following version 

  • Armbian version: 5.41, Kernel version: 4.14.32
  • Armbian version: 5.77,  Kernel version: 4.19.25
  • Armbian version: 5.90,  Kernel version: 4.19.57

I have more than 10 Orangepi PC running busy tasks at the same time, and this issue happened more than 15 times in the past year, still have no idea how to deal with this,  it is still able to send data to my server somehow, which means that the internet connection is available, but i'm not able to ssh.

 

I have tried the following methods:

  • Wrote a udev file to copy all system log when usb pendrive is detected,  this works fine when device runs normally, but not working when encounter this issue.
  • Using watchdog, works fine if using fork bomb to test, but not able to recover from this issue
  • Tried connect to a keyboard and hdmi monitor,  this works fine when device runs normally, but not working when encounter this issue.
  • Using TTL serial to usb adapter,  I'm able to control device when it runs normally, but not working when encounter this issue. 

 

I found that dmesg record some error message as below when I encountered this again yesterday

  Quote

[Thu Aug 29 01:45:44 2019] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[Thu Aug 29 01:45:44 2019] rcu:     0-...!: (4 GPs behind) idle=6ea/1/0x40000000 softirq=63333042/63333042 fqs=1 
[Thu Aug 29 01:45:44 2019] rcu:     (detected by 2, t=213675 jiffies, g=216788133, q=4)
[Thu Aug 29 01:45:44 2019] Sending NMI from CPU 2 to CPUs 0:
[Thu Aug 29 01:45:44 2019] NMI backtrace for cpu 0
[Thu Aug 29 01:45:44 2019] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O      4.19.62-sunxi #5.92
[Thu Aug 29 01:45:44 2019] Hardware name: Allwinner sun8i Family
[Thu Aug 29 01:45:44 2019] PC is at arch_cpu_idle+0x28/0x2c
[Thu Aug 29 01:45:44 2019] LR is at arch_cpu_idle+0x27/0x2c
[Thu Aug 29 01:45:44 2019] pc : [<c01078f4>]    lr : [<c01078f3>]    psr: 40010033
[Thu Aug 29 01:45:44 2019] sp : c0e01f68  ip : ef6a0780  fp : 00000000
[Thu Aug 29 01:45:44 2019] r10: c0dba870  r9 : c0e04d48  r8 : 00000000
[Thu Aug 29 01:45:44 2019] r7 : 00000001  r6 : c0e04db8  r5 : c0e04d70  r4 : ffffe000
[Thu Aug 29 01:45:44 2019] r3 : c0116441  r2 : ef69f438  r1 : 417ee6ec  r0 : 00000000
[Thu Aug 29 01:45:44 2019] Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA Thumb  Segment none
[Thu Aug 29 01:45:44 2019] Control: 50c5387d  Table: 6c03406a  DAC: 00000051
[Thu Aug 29 01:45:44 2019] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O      4.19.62-sunxi #5.92
[Thu Aug 29 01:45:44 2019] Hardware name: Allwinner sun8i Family
[Thu Aug 29 01:45:44 2019] [<c010d74d>] (unwind_backtrace) from [<c010a2f1>] (show_stack+0x11/0x14)
[Thu Aug 29 01:45:44 2019] [<c010a2f1>] (show_stack) from [<c08fc121>] (dump_stack+0x69/0x78)
[Thu Aug 29 01:45:44 2019] [<c08fc121>] (dump_stack) from [<c090036d>] (nmi_cpu_backtrace+0x59/0x90)
[Thu Aug 29 01:45:44 2019] [<c090036d>] (nmi_cpu_backtrace) from [<c010c5d9>] (handle_IPI+0x85/0x2c0)
[Thu Aug 29 01:45:44 2019] [<c010c5d9>] (handle_IPI) from [<c05c9c7f>] (gic_handle_irq+0x67/0x68)
[Thu Aug 29 01:45:44 2019] [<c05c9c7f>] (gic_handle_irq) from [<c0101a65>] (__irq_svc+0x65/0x94)
[Thu Aug 29 01:45:44 2019] Exception stack(0xc0e01f18 to 0xc0e01f60)
[Thu Aug 29 01:45:44 2019] 1f00:                                                       00000000 417ee6ec
[Thu Aug 29 01:45:44 2019] 1f20: ef69f438 c0116441 ffffe000 c0e04d70 c0e04db8 00000001 00000000 c0e04d48
[Thu Aug 29 01:45:44 2019] 1f40: c0dba870 00000000 ef6a0780 c0e01f68 c01078f3 c01078f4 40010033 ffffffff
[Thu Aug 29 01:45:44 2019] [<c0101a65>] (__irq_svc) from [<c01078f4>] (arch_cpu_idle+0x28/0x2c)
[Thu Aug 29 01:45:44 2019] [<c01078f4>] (arch_cpu_idle) from [<c013e96b>] (do_idle+0x14b/0x1d8)
[Thu Aug 29 01:45:44 2019] [<c013e96b>] (do_idle) from [<c013ebed>] (cpu_startup_entry+0x19/0x1c)
[Thu Aug 29 01:45:44 2019] [<c013ebed>] (cpu_startup_entry) from [<c0d00bad>] (start_kernel+0x3a7/0x3c6)
[Thu Aug 29 01:45:44 2019] rcu: rcu_sched kthread starved for 213673 jiffies! g216788133 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2
[Thu Aug 29 01:45:44 2019] rcu: RCU grace-period kthread stack dump:
[Thu Aug 29 01:45:44 2019] rcu_sched       I    0    10      2 0x00000000
[Thu Aug 29 01:45:44 2019] [<c090acdb>] (__schedule) from [<c090b247>] (schedule+0x2f/0x68)
[Thu Aug 29 01:45:44 2019] [<c090b247>] (schedule) from [<c090daf7>] (schedule_timeout+0x77/0x320)
[Thu Aug 29 01:45:44 2019] [<c090daf7>] (schedule_timeout) from [<c016af6f>] (rcu_gp_kthread+0x41f/0x728)
[Thu Aug 29 01:45:44 2019] [<c016af6f>] (rcu_gp_kthread) from [<c0132ae9>] (kthread+0xfd/0x104)
[Thu Aug 29 01:45:44 2019] [<c0132ae9>] (kthread) from [<c01010f9>] (ret_from_fork+0x11/0x38)
[Thu Aug 29 01:45:44 2019] Exception stack(0xef14bfb0 to 0xef14bff8)
[Thu Aug 29 01:45:44 2019] bfa0:                                     00000000 00000000 00000000 00000000
[Thu Aug 29 01:45:44 2019] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[Thu Aug 29 01:45:44 2019] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[Thu Aug 29 02:14:52 2019] systemd-journald[260]: Failed to run event loop: Invalid argument

Expand  

 

Currently, I might try to continuously detect whether this error message occurred in dmesg, if yes, I will try using this command "sudo systemctl --force --force reboot"(it seems that /sbin/reboot does not work at the moment)

 

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines