Everything posted by karamike
I've downgraded to a system with kernel 4.19.63 as mentioned in the posting by Mangix above. Since then the systems runs without freezes (already for a few weeks). Two things should be noted: - The "blinkenlights" - i.e. the hard drive access LEDs - do work again. They didn't work at 5.8.16. - The CPU load of the netatalk filesystem daemon is now significantly lower. This would be consistent with the remark from gprovost that my error message from the earlier system points to a system overload. I do not know if the now lower load is due to the older kernel or due to the older version of netatalk running now.
@gprovost The files are being shared via netatalk to a Mac. That worked fine until now. The number of files within a single folder is quite large (up to 13,000 files). The hard drive is formatted with ext4 using the DIR_INDEX option to speed up the file access. The problem first appeared after switching from one of the four drives from 2 TB to 4 TB. I've copied the files from A to B, so the number of files did not change. But I had to reindex all files within a picture processing software on the Mac (creating thumbs, determine picture properties, etc.). I've installed the software watchdog, as mentioned above, but that did not help either. If the system freezes a software watchdog does not have any chance to intervene. After reading about PSU issues here, I disconnected one drive (data + supply voltage) and started the box with only 3 drives. After 10 hours or so of indexing the system did freeze again. I wrote a simple shell script to display temperature and processor load every second. When the system froze the temperature was around 50 °C and the process load slightly above 1. I also noticed that the "blinking lights" (HD access, SMD LEDs) are all off - even when the box seems to work normally. Is this an indication of something? Thanks
I don't want this to be a me-too-post - but why else would I be here... I have an old Helios4 with the original PSU, Armbian 20.08.17 Bionic, Linux 5.8.16-mvebu, running in JBOD mode. I've also recently start to experience random freezes while continuously accessing a single drive (picture indexing). At some point the Ethernet connection is lost, no files, no ssh, no pings. Under this conditions I've connected a laptop to the serial output of the Helios4. It doesn't react to any input but displays the following messages: [30482.836176] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [30482.842115] rcu: 1-...!: (0 ticks this GP) idle=87e/1/0x40000002 softirq=7995606/7995606 fqs=1 [30492.851735] rcu: rcu_sched kthread starved for 131290 jiffies! g13530313 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 [30492.862281] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [30492.871261] rcu: RCU grace-period kthread stack dump: [30545.855768] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [30545.861707] rcu: 1-...!: (0 ticks this GP) idle=87e/1/0x40000002 softirq=7995606/7995606 fqs=1 Does this hint to the kernel problem mentioned above?