gmerrall Posted July 9, 2021 Posted July 9, 2021 My Helios64 NAS is suddenly randomly just, stopping. It appears to be totally random. It doesn't appear to be some form of actual shutdown as the front panel lights all stay on. I lose all access including the serial console. The only way to reboot is a long press on the power button to actually physically reboot the box. I've cranked the verbosity up to 7 in /boot/armbianEnv.txt in case something is output on the serial console but I've not seen any useful output yet but for obvious reasons it's hard to catch it happening. Because /var/log is cleared every boot I can't see if anything was logged just prior to the halt event. Not sure where to start trying to diagnose. One option is to remove the folder2ram mount for /var/log to persist logs? output of "uname -a" in case it helps Linux helios64 5.10.43-rockchip64 #21.05.4 SMP PREEMPT Wed Jun 16 08:02:12 UTC 2021 aarch64 GNU/Linux Kobol folks: I'm in Singapore if that helps any. 0 Quote
wurmfood Posted July 9, 2021 Posted July 9, 2021 You can set up logging to go to a flash drive, but capturing the console is going to be the best bet. Do you have another computer you can leave on with the serial console connected? For example, I have a small NUC where I keep picocom running in a tmux session, that way I don't lose anything. You can use the -g option to have picocom log to a file, as well. 0 Quote
gmerrall Posted July 12, 2021 Author Posted July 12, 2021 I had serial logging underway but the screen idea was a good one. Of course after happening pretty much everyday, it's now not happening at all. Typical! 0 Quote
digwer Posted July 16, 2021 Posted July 16, 2021 (edited) I have the same problem as described: the device is powered on, all lights are on (however, not blinking) but not accessible trough serial nor ssh. I will try leaving raspberrypi connected via serial, to capture the logs, because nothing appears on systemd log. Edited July 16, 2021 by digwer 0 Quote
jotapesse Posted July 16, 2021 Posted July 16, 2021 Hi! Not sure if it's related to the original post but my Kobol Helios64 seems to have just stopped responding following a reboot. Already tried to power down and power up and the issue persists. Hardware starts, disks are initialised and responds to ping. But no longer can be accessed via SSH or the OpenMediaVault web admin page. My Helios64 is updated with latest Armbian Buster 5.10.x and latest OpenMediaVault 5.x. Armbian + OMV installed to internal eMMC. Docker apps installed to M.2 SATA Port1. Other 4 HDDs with a ZFS setup on SATA ports 2 to 5 for data storage. A few minutes prior to doing the reboot I noticed some kernel errors on a SSH connected shell to my Helios64 the syslog errors (posted on the shell). They kept appearing randomly and separated for a few minutes apart as the following: Message from syslogd@helios64 at Jul 16 18:47:05 ... kernel:[111630.816643] Internal error: Oops: 96000004 [#8] PREEMPT SMP Message from syslogd@helios64 at Jul 16 18:47:05 ... kernel:[111630.843686] Code: 14000011 f9400273 b40001f3 d1002274 (b9402280) Then rebooted from the OMV web admin page and it stopped responding from there. Pings ok but dead SSH connection is refused and OMV dead web admin page. Any idea of what's this about and how to sort this? 0 Quote
meymarce Posted July 16, 2021 Posted July 16, 2021 You guys have tried the usual low CPU clock/ raise voltage things? 0 Quote
digwer Posted July 17, 2021 Posted July 17, 2021 16 hours ago, jotapesse said: Message from syslogd@helios64 at Jul 16 18:47:05 ... kernel:[111630.816643] Internal error: Oops: 96000004 [#8] PREEMPT SMP Message from syslogd@helios64 at Jul 16 18:47:05 ... kernel:[111630.843686] Code: 14000011 f9400273 b40001f3 d1002274 (b9402280) jotapesse, I think that kernel Oops is different problem from gmerrall and mine. In my/our case we don't have any kernel crashes, just random hardware lockup. I would suggest to connect via serial, get the whole kernel crash and create another thread. meymarce, I haven't tried that. How should I do that? 0 Quote
meymarce Posted July 17, 2021 Posted July 17, 2021 You can also try setting MAX_SPEED to 408000. However I have a stable system since I raised the voltage in /boot/boot.cmd 0 Quote
jotapesse Posted July 17, 2021 Posted July 17, 2021 11 hours ago, digwer said: jotapesse, I think that kernel Oops is different problem from gmerrall and mine. In my/our case we don't have any kernel crashes, just random hardware lockup. I would suggest to connect via serial, get the whole kernel crash and create another thread. Thanks! Yes, I believe that's the case. I guess something got corrupted on my install. Booted SD card Armbian fine. Tried everything, copy all files at the /boot directory, filesystem check, chrooted and updated and fully upgraded. In the end I gave up and reinstalled everything from scratch. Let's see how if it's stable on the long run... I'm getting a bit worried with all reports I read here regarding instability, hangs, crashes, boot corruptions, custom cpu voltage and frequency modifications. I need it to work reliably as a NAS and apps server. Hopefully all of you will get yours storted as well. 0 Quote
EPZ Posted April 30, 2023 Posted April 30, 2023 Hi, gmerall and digwer, did you find a solution to this problem ? I am facing the same problem and have no idea how to solve it. (Also on Helios 64, Ambian 22.02.1 with Debian and Kernel 5.15.93) Best regards, EPZ 0 Quote
digwer Posted May 11, 2023 Posted May 11, 2023 Hi @EPZ Sorry for the late response. I think this problem was solved by locking cpu frequency witch cpufrequtils: ``` root@helios64:/etc# cat default/cpufrequtils ENABLE=true MIN_SPEED=1008000 MAX_SPEED=1008000 GOVERNOR=performance ``` After locking cpu frequency, check here few times: ``` root@helios64:/sys# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:1008000 /sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq:1008000 /sys/devices/system/cpu/cpu2/cpufreq/scaling_cur_freq:1008000 /sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq:1008000 /sys/devices/system/cpu/cpu4/cpufreq/scaling_cur_freq:1008000 /sys/devices/system/cpu/cpu5/cpufreq/scaling_cur_freq:1008000 ``` 0 Quote
EPZ Posted May 28, 2023 Posted May 28, 2023 Hi @digwer Thanks for your answer. It has helped so far Let's see on the long term now A few questions for my better understanding: 1) What is the initial issue, or why does this help ? 2) What is the impact of this change on the performance ? (Main activity is file up-/downloading through Nextcloud and git) 3) And the Impact on power consumption ? (my Helios is largely idle, the main activity is every 30s a request from nextcloud to check if there are any changes) 4) What are the limits/options for these settings while still fixing the issue ? Best regards and thanks again for your help EPZ 0 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.