Helios64 randomly dying - help!


gmerrall
 Share

3 3

Recommended Posts

My Helios64 NAS is suddenly randomly just, stopping. It appears to be totally random.   It doesn't appear to be some form of actual shutdown as the front panel lights all stay on. I lose all access including the serial console.  The only way to reboot is a long press on the power button to actually physically reboot the box.

 

I've cranked the verbosity up to 7 in /boot/armbianEnv.txt in case something is output on the serial console but I've not seen any useful output yet but for obvious reasons it's hard to catch it happening. Because /var/log is cleared every boot I can't see  if anything was logged just prior to the halt event. 

 

Not sure where to start trying to diagnose. One option is to remove the folder2ram mount for /var/log to persist logs?

 

output of "uname -a" in case it helps

Linux helios64 5.10.43-rockchip64 #21.05.4 SMP PREEMPT Wed Jun 16 08:02:12 UTC 2021 aarch64 GNU/Linux

 

Kobol folks: I'm in Singapore if that helps any.

Link to post
Share on other sites

Donate and support the project!

You can set up logging to go to a flash drive, but capturing the console is going to be the best bet.

Do you have another computer you can leave on with the serial console connected? For example, I have a small NUC where I keep picocom running in a tmux session, that way I don't lose anything. You can use the -g option to have picocom log to a file, as well.

Link to post
Share on other sites

I have the same problem as described: the device is powered on, all lights are on (however, not blinking) but not accessible trough serial nor ssh.
I will try leaving raspberrypi connected via serial, to capture the logs, because nothing appears on systemd log.

Edited by digwer
Link to post
Share on other sites

Hi! Not sure if it's related to the original post but my Kobol Helios64 seems to have just stopped responding following a reboot. Already tried to power down and power up and the issue persists.

 

Hardware starts, disks are initialised and responds to ping. But no longer can be accessed via SSH or the OpenMediaVault web admin page. My Helios64 is updated with latest Armbian Buster 5.10.x and latest OpenMediaVault 5.x. Armbian + OMV installed to internal eMMC. Docker apps installed to M.2 SATA Port1. Other 4 HDDs with a ZFS setup on SATA ports 2 to 5 for data storage.

 

A few minutes prior to doing the reboot I noticed some kernel errors on a SSH connected shell to my Helios64 the syslog errors (posted on the shell). They kept appearing randomly and separated for a few minutes apart as the following:

 

Message from syslogd@helios64 at Jul 16 18:47:05 ...
 kernel:[111630.816643] Internal error: Oops: 96000004 [#8] PREEMPT SMP

Message from syslogd@helios64 at Jul 16 18:47:05 ...
 kernel:[111630.843686] Code: 14000011 f9400273 b40001f3 d1002274 (b9402280)

 

Then rebooted from the OMV web admin page and it stopped responding from there. Pings ok but dead SSH connection is refused and OMV dead web admin page. Any idea of what's this about and how to sort this?

Link to post
Share on other sites

16 hours ago, jotapesse said:
Message from syslogd@helios64 at Jul 16 18:47:05 ...
 kernel:[111630.816643] Internal error: Oops: 96000004 [#8] PREEMPT SMP

Message from syslogd@helios64 at Jul 16 18:47:05 ...
 kernel:[111630.843686] Code: 14000011 f9400273 b40001f3 d1002274 (b9402280)

 

jotapesse, I think that kernel Oops is different problem from gmerrall and mine. In my/our case we don't have any kernel crashes, just random hardware lockup. I would suggest to connect via serial, get the whole kernel crash and create another thread.

meymarce, I haven't tried that. How should I do that?

Link to post
Share on other sites

11 hours ago, digwer said:

jotapesse, I think that kernel Oops is different problem from gmerrall and mine. In my/our case we don't have any kernel crashes, just random hardware lockup. I would suggest to connect via serial, get the whole kernel crash and create another thread.

 

Thanks! Yes, I believe that's the case. I guess something got corrupted on my install. Booted SD card Armbian fine. Tried everything, copy all files at the /boot directory, filesystem check, chrooted and updated and fully upgraded. In the end I gave up and reinstalled everything from scratch. Let's see how if it's stable on the long run... I'm getting a bit worried with all reports I read here regarding instability, hangs, crashes, boot corruptions, custom cpu voltage and frequency modifications. I need it to work reliably as a NAS and apps server. Hopefully all of you will get yours storted as well.

Link to post
Share on other sites

 Share

3 3