Jump to content

Recommended Posts

Posted (edited)

I'm using Armbian 20.11.1 Focal (5.9.11-rockchip64) on my Helios64. When there's heavier load via ethernet (SFTP to a SATA Ultrastar 12TB disk) for something like more than an hour, the entire system will freeze. Not sure if it actually does freeze but I lose all SSH connections and cannot connect anymore via SSH unless I restart the Helios64. Happened today 4 times and it's totally reproducible. Haven't tested another drive yet.

 

What makes this difficult to debug is that under /var/log/syslog or kern.log or faillog nothing relevant is being written as to in what state the device is.

 

Is this a known issue?

How can I find out what the problem is?

Edited by Schroedingers Cat
Posted (edited)

When I'm connected via COM/PuTTY, this is what happens when the system freezes while i monitored via iotop:

Quote

  11385 ?sys sftp-user      0.00 B/s   98.59 M/s  0.00 %  0.00 % sftp-server
      4 ?sys root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_par_gp]
      6 ?sys root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker~-kblockd]
      8 ?sys root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [mm_percpu_wq]
      9 ?sys root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/0]
     10 ?sys root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_preempt]
     11 ?sys root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/0]
     12 ?sys root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [cpuhp/0]
     13 ?sys root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [cpuhp/1]
     14 ?sys root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/1]
     15 ?sys root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/1]
     17 ?sys root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker~-kblockd]
     18 ?sys root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [cpuhp/2]
     19 ?sys root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/2]
     20 ?sys root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirq-/2]
     18 ?sys root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [cpuhp/2]
     19 ?sys root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/2]
     20 ?sys root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/2]
  keys:  any: refresh  q: quit  i: ionice  o: active  p: procs  a: accum
  sort:  r: asc  left: SWAPIN  right: COMMAND  home: TID  end: [45859.904820] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[45859.905384] rcu:     2-...!: (16 ticks this GP) idle=892/1/0x4000000000000000 softirq=933974/933976 fqs=1
[45859.906213] rcu:     3-...!: (8 GPs behind) idle=842/0/0x1 softirq=985077/985077 fqs=1
[45859.906899] rcu:     5-...!: (36 ticks this GP) idle=e8a/1/0x4000000000000000 softirq=2724805/2724808 fqs=1

 

After that, I cannot send any commands via COM, so the system is definitely freezing.

 

It's also happening when copying to any drive and it's also happening when there's less drives connected.

 

Any idea what this instability is caused by?

Edited by Schroedingers Cat
Posted
On 12/12/2020 at 12:00 AM, Schroedingers Cat said:

 

 

What makes this difficult to debug is that under /var/log/syslog or kern.log or faillog nothing relevant is being written as to in what state the device is.

 

 

Logs are usually stored in ram and only written every few minutes to drastically increase the lifespan of sd cards. The downside is that it is sometimes hard to track down issues. You could either disable log2ram and try to reproduce or connect debug console and follow the output of dmesg and wait until another freeze happens.

 

Though there does not necessary need to be any output. Sometimes systems freeze without giving a clue whatsoever :/

Posted

Thanks for your response. I'm only allowed to answer after 24 hours, for some reason.

 

I disabled the log2ram but the log files are still not showing anything interesting. Crash happened on 15th of December around 16:00 and restart was around 19:50.

Here's the relevant section from /var/log/syslog:

Quote

Dec 15 14:15:01 localhost CRON[24499]: (root) CMD (/usr/lib/armbian/armbian-truncate-logs)
Dec 15 14:15:01 localhost CRON[24500]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Dec 15 14:17:01 localhost CRON[24516]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Dec 15 14:25:01 localhost CRON[24571]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Dec 15 14:30:01 localhost CRON[24604]: (root) CMD (/usr/lib/armbian/armbian-truncate-logs)
Dec 15 14:35:01 localhost CRON[24640]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Dec 15 14:45:01 localhost CRON[24706]: (root) CMD (/usr/lib/armbian/armbian-truncate-logs)
Dec 15 14:45:01 localhost CRON[24707]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Dec 15 14:55:01 localhost CRON[24774]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Dec 15 15:00:01 localhost CRON[24808]: (root) CMD (/usr/lib/armbian/armbian-truncate-logs)
Dec 15 15:05:01 localhost CRON[24842]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Dec 15 15:15:01 localhost CRON[24910]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Dec 15 15:15:01 localhost CRON[24911]: (root) CMD (/usr/lib/armbian/armbian-truncate-logs)
Dec 15 15:17:01 localhost CRON[24927]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Dec 15 15:25:01 localhost CRON[24983]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Dec 15 15:30:01 localhost CRON[25017]: (root) CMD (/usr/lib/armbian/armbian-truncate-logs)

Dec 15 19:50:59 localhost systemd-modules-load[417]: Inserted module 'lm75'
Dec 15 19:50:59 localhost systemd-sysctl[433]: Not setting net/ipv4/conf/all/promote_secondaries (explicit setting exists).

 

/var/log/kern.log:

Quote

Dec 14 00:17:14 localhost kernel: [27357.681768] NOHZ: local_softirq_pending 08
Dec 14 01:27:26 localhost kernel: [31569.414047] NOHZ: local_softirq_pending 08
Dec 14 02:40:35 localhost kernel: [35958.205471] NOHZ: local_softirq_pending 08
Dec 14 07:35:34 localhost kernel: [53657.100804] NOHZ: local_softirq_pending 08
Dec 14 13:46:37 localhost kernel: [75918.982748] NOHZ: local_softirq_pending 08
Dec 15 19:50:59 localhost kernel: [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]

 

Nothing interesting found in /var/log/dmesg.

 

Quote

or connect debug console and follow the output of dmesg and wait until another freeze happens.

Do you mean connecting via COM/USB? I lose connection if it happens, but I can try.

 

 

Anything else I can do to investigate this? Do you think my device is broken?

Posted
9 hours ago, Schroedingers Cat said:

I'm only allowed to answer after 24 hours, for some reason.

Unfortunately a needed measure to fight spam bots :(

Anyway you received a like which should lift the restriction within 24h.

Posted

Support told me to do the following:

 

Run `armbian-config`, go to -> System -> CPU

 

And set:

Minimum CPU speed = 1200000

Maximum CPU speed = 1200000

CPU governor = performance

 

I'm now writing to my HDD via SFTP for more than 1.5 days without an issue, so that seemed to solve it.

Posted

For now we haven't forced the CPU governor to performance, we still hope to find a fix for DVFS to work properly.

v20.11.4 doesn't change anything related to DVFS, it is just a rebuild after the realtek r8152 driver was removed by mistake on previous version.

 

BTW you can check you current CPU governor settings with following command : cpufreq-info

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines