Jump to content

Anyone here have a stable Helios64 running OMV6?


TijuanaKez

Recommended Posts

If so how'd ya do it!?!?

 

My personal experience had been a rock solid device on the original official Debian Armbian image (Linux Kernel: 5.10), and OMV5.

It never went down unless I manually rebooted, and I never needed any CPU governor changes, voltage changes or otherwise to keep it stable.

 

Reading so much stuff about OMV5 being end of life made me pull the trigger on upgrade to OMV6 along with the kernel upgrade that OMV did automatically, at the time unaware of the state of Kobol and Armbian support.

Now I get about 1-2 days on average before it KPs and I have to hit the reset button.

 

Tried the CPU underclocking, performance setting, VDD boost and various combinations but to no avail.

 

So just reaching out to any other users here, is anyone still rocking a stable unit in 2023?

 

Did you have to revert to Buster / OMV5?

 

Are there any other tweaks to try that I've missed other than CPU clock/profile/vdd tweak?

 

Is there any way to get any information on why it might be happening? Or is the only way to have a hardware serial console open 24/7 via a usb cable?

 

 

 

 

 

 

 

Edited by TijuanaKez
Link to comment
Share on other sites

44 minutes ago, TijuanaKez said:

My personal experience had been a rock solid device on the original official Debian image (Linux Kernel: 5.10), and OMV5.


Thank you. It was never Debian

 

44 minutes ago, TijuanaKez said:

original official Debian image

 

but Armbian (with Debian user space packages) since the day 1 but at some point we had to stop maintaining hw interface / kernel, also Kobol went out of business. When this happens, things starts to fall apart and if you would take our work more serious, you could have this hardware well supported.

Link to comment
Share on other sites

whoops! Sorry for that mistake. This is Armbian forums after all,  I just dont really understand the inner workings of how the Armband is related to Debian. Just going off this from Kobol website.

 

ScreenShot2023-06-22at8_29_59pm.png.76743809a6c6ac8832fe3067f338000f.png

 

Yes, well, well aware of how things went with Kobol and that this hardware is no longer supported officially.

This being the unmaintained section though (and because I know of no where else to ask) I was hoping to reach out any remaining Helios64 *users*.

Its never my intention to bother Armbian developers to help fix unmaintained hardware.

Hope you understand where I'm coming from.

I'm not ready to give up on this board yet as I still see nothing that compares features wise (number of sata ports), power consumption wise, size, form factor, and well...I own one.

Surely there are other users still using theirs?

Edited by TijuanaKez
Link to comment
Share on other sites

I am on OMV6 with the latest kernel 5.19.93 and I was fiddeling around a lot to get a stable setup. I did a clean install recently and everything works what I need it for. Unfortunately it does random reboots - so I get around 15 to 20 days and then the helios64 reboots and I have not found out what triggers that behavior. I did run it with ondemand governor at full cpu range. Maybe I will try it to throttle it once I am back on the machine. And I might give it a go with an older kernel. Some recommended the 5.19.63 build.

 

I once tried to build my own kernels but did not have enough time to do it properly. Maybe I will jump back at this waggon because at the end I would love to have the machine work the way I want it to.

 

I keep you posted but it will take a while with further experience. trial and error.

Edited by bunducafe
Link to comment
Share on other sites

Quote

 

I am on OMV6 with the latest kernel 5.19.93 and I was fiddeling around a lot to get a stable setup. I did a clean install recently and everything works what I need it for. Unfortunately it does random reboots - so I get around 15 to 20 days and then the helios64 reboots and I have not found out what triggers that behavior. I did run it with ondemand governor at full cpu range. Maybe I will try it to throttle it once I am back on the machine. And I might give it a go with an older kernel. Some recommended the 5.19.63 build.

 

I once tried to build my own kernels but did not have enough time to do it properly. Maybe I will jump back at this waggon because at the end I would love to have the machine work the way I want it to.

 

 

Thanks for that info!

15-20 days I could live with, especially if by reboot you mean it restarts and doesn't freeze and need to be manually reset.

 

Update from my own testing:

 

I recently pulled one drive out so it's back to 4 bays out of 5.

Since then it's been rock solid for over 2 weeks which is vastly different to before pulling the drive.

 

I currently have CPU locked to 14000 in performance profile (but that didn't seem to help before pulling out this drive).

If it continues to stay solid I will test again set back to on-demand full range.

 

Being that some users report solid operation, and others have seen instability from the start (even on the old kernel), it would be interesting to see how many drives everyone has plugged in, and what type.

I can see from quickly scanning back over some of the thread here that many users complaining about instability seem to be using all 5 bays with 3.5inch drives.

Add to this the VDD boost for RK3399 mentioned here, the instability is feeling like it's current draw / PWM related. Maybe people know this already and I missed it.

Just speculating, and I'm definitely our of my range of knowledge here, but perhaps 5x 7200RPM drives drawing current simultaneously and on-demand governor switching up to 18000 its tripping it somehow.

 

Other *possibly* related reading material.

https://forum.odroid.com/viewtopic.php?t=30303

https://github.com/u-boot/u-boot/commit/f210cbc1f3d4f84baa595d83933369b4d6a529ea

https://github.com/u-boot/u-boot/commit/5a6d960b222203e60ab57e19b3eb7b41c24b564b

https://wiki.t-firefly.com/en/Firefly-RK3399/driver_pwm.html

http://patchwork.ozlabs.org/project/uboot/patch/20191128061433.1952869-2-anarsoul@gmail.com/

Edited by TijuanaKez
Link to comment
Share on other sites

That’s interesting. I have 5 bays equipped, 1x SSD as media drive, 3x WD Reds 5400rpm as mergerfs pool and 1x Toshiba N drive with 7200rpm… 

 

actually I was sometimes assuming that it might has to do with waking up drives and that the PSU does not generate enough power to do so… I might have a try with the upscale the voltage workaround you posted as it seems plausible. I also was considering to get a new PSU and see if the shipped one is a bit faulty…

Link to comment
Share on other sites

Quote

 

That’s interesting. I have 5 bays equipped, 1x SSD as media drive, 3x WD Reds 5400rpm as mergerfs pool and 1x Toshiba N drive with 7200rpm… 

 

actually I was sometimes assuming that it might has to do with waking up drives and that the PSU does not generate enough power to do so… I might have a try with the upscale the voltage workaround you posted as it seems plausible. I also was considering to get a new PSU and see if the shipped one is a bit faulty…

 

Of note also are the big electrolytic capacitors soldered onto long wires on the Sata power connector harness.

Again just speculation, but this could have been an attempt to add extra power filtering to the boards after they'd been printed if a potential problem was identified later.

I may experiment with upgrading these and check power stability on the rails when the drives are in use with a scope.

 

For your random *reboot* issue. I would also look at suspend states.

 

As of last updates, the suspend issues were still never resolved (see here), in which case unless I'm missing something, suspend should be disabled.

But I think it's enabled by default for this board in Armbian 22 rockchip64.

I think when it tries to enter one or more of those states, it fails and reboots.

You can check with 

sudo systemctl status sleep.target suspend.target hibernate.target hybrid-sleep.target

 

To Disable

sudo systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target

 

After disabling suspend states, my unit never reset on its own again. But then I inherited the KP freeze instead.

It's possible only particular states need to be disabled and e.g sleep can stay enabled, I'll keep testing.

Fingers crossed, with fewer power hungry drives, and offending suspend disabled, the unit should stay up indefinitely.

 

Edited by TijuanaKez
Link to comment
Share on other sites

As I would love to suspend my disks anyway after 5h without accessing the NAS I am kind of reluctant to try the suspend workaround. But what could be interesting is changing the HDDs to one more SSD. Or make 3 (bigger) HDDs out of the 4 in my system in order to avoid the hiccups.

Link to comment
Share on other sites

2 hours ago, bunducafe said:

As I would love to suspend my disks anyway after 5h without accessing the NAS I am kind of reluctant to try the suspend workaround. But what could be interesting is changing the HDDs to one more SSD. Or make 3 (bigger) HDDs out of the 4 in my system in order to avoid the hiccups.

These sleep/susepnd modes are seperate from the hard drive power settings. You can still set power/spin down in OMV.

I't the Suspend to RAM state that is mentioned. Which probably means most of the states wont work by extension.

 

CURRENT branch (Linux Kernel 5.10)

Shutdown: OK 

Reboot: OK 

Suspend to RAM:  Not Supported. USB host controller refuses to enter suspend mode

Wake On LAN:  Not Supported. Suspend still have issue

 

 

Edited by TijuanaKez
Link to comment
Share on other sites

Well, I was kind of busy travelling the last few weeks but my machine now has worked for 22 days without hicups and had to complete two snapraid syncs and scrubs in the meantime with some minor scheduled task additionally. I indeed did set the governor to 400 and 1400 MHz ondemand.

 

 

Bildschirmfoto 2023-08-03 um 10.35.46.png

Edited by bunducafe
Link to comment
Share on other sites

22 hours ago, bunducafe said:

Well, I was kind of busy travelling the last few weeks but my machine now has worked for 22 days without hicups and had to complete two snapraid syncs and scrubs in the meantime with some minor scheduled task additionally. I indeed did set the governor to 400 and 1400 MHz ondemand.

 

 

Bildschirmfoto 2023-08-03 um 10.35.46.png

Edited 22 hours ago by bunducafe

 

 

Hmm Interesting. I've been running some tests with different CPU governor settings but still haven't found stability with the ondemand governor (or shedultil for that matter)

 

Out of curiosity, what made you limit the on demand governor to 1400?

Most of the notes on this issue seem to say it's not the frequency, but the constant switching of frequencies that's the problem.

 

I'll try with your settings and see how long mine stays up.

 

I should note that I did take one 7200 drive out, but have since put a small SSD in it's place.

 

Another thought that sprung to mind to keep the stability of performance governor, but not waste too much power would be to add some echo > scaling_max_freq commands as cron jobs to clock the CPU right down during part of the day when the Helios is usually not doing much other than staying alive.

 

 

Edited by TijuanaKez
Link to comment
Share on other sites

No scientific research, but as the A53 cortex only runs maxed out with 1400 MHz I thought that's the way to go... also in the past some users here on the board were posting more stable machines with max at 1400 MHz. 

 

But as I am no tester whatsoever I cannot measure actually the differences in case the system freezes and reboots but am somewhat happy that the actual system works as it should.

Link to comment
Share on other sites

Hey there, I also still use a Helios64 with the Kobol NAS enclosure. I'm not using OMV though, just Armbian directly. Since it is still on Debian 10 (Buster), and the Debian team is already at Debian 12 (Bookworm) for Debian Stable right now, Buster is considered "oldoldstable". That is the code name that Debian uses. Once Debian 13 (Trixie) is released, most of these systems will no longer get updates from anywhere, and will start severely getting out of date with more and more security vulnerabilities.

 

One option I've been looking at is the TerraMaster F5-422 - https://www.terra-master.com/global/products/smallmedium-businesses-nas/f5-422-10g-nas.html

 

If you look at it, it has almost the same dimensions and measurements as the Kobol  There's also the F4-423 with one less drive bay and $100 cheaper. The problem is they run a bad version of Linux called "TOS", which is slow. But you can re-flash it. Since it is an x86 board, it should support regular Debian/OMV or TrueNAS, which is great. The big problem is you will void their warranty if you install another OS, and this stupid Chinese-based company doesn't really care about their customers very much it seems; I've seen that their attitude about their products and support was awful on their forums. You can't even buy replacement parts if the specialized motherboard dies, or any RAID controller boards if they break physically. You are S.O.L. if any of the HW breaks.

 

Another option is the JONSBO cases. been looking at the N1 and N2 cases.

JONSBO N1 - https://www.newegg.com/jonsbo-nas-case-mini-itx/p/2AM-006A-00074

JONSBO N2 - https://www.newegg.com/p/2AM-006A-000B7?Item=9SIAY3SJNH8050&cm_sp=SP-_-1582697-_-0-_-2-_-9SIAY3SJNH8050-_-JONSBO N2-_-jonsbo|n2-_-1

 

You can also see reviews on YouTube for these, or on other Sites.

 

There are pros and cons to them as well. They are not identical dimensions, so if having exact space is a concern, maybe don't get them. The also use a regular Micro-ITX motherboard, which is great, but may not be as extremely power efficient. Still, that's more appealing than the TerraMaster right away, since it gives you more modularity, "repairability", and more available options, like putting in a low-profile Intel Gigabit Ethernet NIC, or something to the PCI-E slot.

 

But yes, the company is out of business, Armbian no longer has the resources nor wants to support them for that obvious reason, and most of us are just users, and not experienced developers who just want a good open source NAS, hence why nobody seems to be an Armbian maintainer for it anymore.

 

There were attempts to get vanilla Debian to run on the Kobol, but kernel patches are needed for the SATA stuff - https://wiki.debian.org/InstallingDebianOn/Kobol/Helios64

 

It looks like the page hasn't been updated since 2021. If it's possible that Debian 12 has the PCI-X drivers necessary for SATA to work, that would be cool. I would assume maybe the LEDs won't work right, or there will be other paper-cuts, but it would be so great to just have Debian 12 on this thing. However, I don't think that will be a reality. It's time to replace at this point. Don't wait until the next Debian release to do it.

Edited by mrjpaxton
Link to comment
Share on other sites

@mrjpaxton Well, at least once you set up the helios64 with the oldest official kernel you can indeed upgrade to Debian Bullseye (11) which gives you still a bit of time as long time support will be approx. until 2026. I am a bit at easy with the machine as it runs flawlessly. If I won't be able to build my own kernel until then I might just leave it as it is. Exposed only to the local network I don't see too much of a security risk either... but let's see how everything turns out. Maybe I will shift to a different NAS enclosure that "just runs" and has a decent support. On the other side I quite liked Kobol's approach with the Helios64.

Link to comment
Share on other sites

@mrjpaxton Thanks for chiming in. The TerreMaster does indeed have alot in common with the Helios64, and it does have the bonus of 10gbe.

At over A$1000 (probably AUD$1100 with postage) it's it's more than double what I paid for the Helios full bundle, and the Helios was already a bit of a stretch philosophically for what a NAS adds to my life. Add to that the caveats you mentioned......

Likewise the 'build a small PC' type NAS always seemed I'd be entering into the power consumption equivalent of just leaving my main PC on 24/7 (which has plenty of sata and 10gbe already).

When the PI3 then 4 came around, it seemed like the perfect thing for a NAS. The right amount of CPU/ram, the small form factor, low power consumption, low price, easy to get OMV on it etc. Just sadly no way to get real sata on it.

With a Pi4 4GB costing me about A$100, it seemed adding *with* real sata for a modest price increase shouldn't be too much to ask for. Why doesn't someone make it I thought?

Well they did, it was the Helios board. And it was awesome. And even though I dont care much about the case for something that'l probably be out of sight (I can 3D print), the hot pink got me :)

Fast forward to now, when such things should be getting cheaper, TerraMaster think A$900 is appropriate to add to a raspberry pi 4 equivalent SBC to add 5 sata ports, 10gbe, and a plastic case. Not feeling any purchase urges at all.

 

 @bunducafe

On a positive note, I've been running with your governor settings (ondemand - 4000 - 14000) and it does seem pretty stable. I've had to reset it once in 10 days, but I think I can live with that.

 

 

Link to comment
Share on other sites

vor 4 Stunden schrieb TijuanaKez:

On a positive note, I've been running with your governor settings (ondemand - 4000 - 14000) and it does seem pretty stable. I've had to reset it once in 10 days, but I think I can live with that.

 

But it does reboot every 10 days by itself? Here I don't have any difficulties with the said settings. So maybe there is a different setting / hardware issue that causes the reboots. Mine has now been stable one month (touch wood). I will definitely use the machine as long as it runs...

 

Harddisks:

- SSD with media library and stores all docker containers

- 4x 4TB HDDs (3 are merged with mergerfs and 1 hdd is solely the parity disk for snapraid)

 

Not too many fancy things further on. Docker runs Jellyfin and Navidrome. VPN and mediastacks are only running when needed to. Otherwise I stop these containers.

 

Bildschirmfoto2023-08-17um09_01_45.thumb.png.4963ae8ad2c60d28ce2ef245dfc0923d.png

Link to comment
Share on other sites

Quote

But it does reboot every 10 days by itself? 

Mine never reboots, it only 'freezes' sporadically on the default cpu freq/governor settings and requires me to physically press the reset button. TBH a reboot would be better as I dont need to be in the house to fix it.

Did you try disabling the suspend modes I mentioned a few posts back?

 

Link to comment
Share on other sites

Very important, but I want to mention here that the Armbian community has support for Debian Bookworm since June 30th - https://www.armbian.com/helios64/

 

For those of you still on Buster (or Bullseye) make two full rsync backups of your rootfs/configs, download the CLi version, not minimal, because it will have a particular tool which we will need, and follow the eMMC install instructions - https://wiki.kobol.io/helios64/install/emmc/ - I would personally delete the partition and wipe the partition table (requires MS-DOS, not GPT) before you "unxz" the archive and reflash the image. This is what Armbian recommends doing. The full distribution upgrade is a too risky without a backup anyway.

 

You can then use Picocom after attaching the USB-C serial connection with this command:

 

sudo picocom -b 1500000 /dev/ttyUSB0

 

Which will help with the initial setup.

 

I want to take some of what I said about the security vulnerability implications back, because I have backed up my old Armbian Buster install, flashed this new image, and it's been running very successfully for me. Been taking some time to meticulously restore my configs and scripts that I use.

 

Apparently according to these instructions - https://docs.openmediavault.org/en/latest/installation/on_debian.html - it is possible to install OMV on Armbian with `armbian-config`. I haven't done this yet, but someone should try this out.

 

Run Armbian Config and go into: Software (System and 3rd party software install) > Softy (3rd party applications installer), then check OMV (OpenMediaVault NAS solution)

 

Please try this out on a fresh Armbian Bookworm install, and let us all know how this goes! Best of luck. I'm curious to know which version of OMV this is though.

Edited by mrjpaxton
Link to comment
Share on other sites

@mrjpaxton Thanks for the info. If I get some time I may try that out.

Really though I'd be happy to freeze the kernel now if it would remain stable.

The main purpose of the unit for me is running Nextcloud so whatever it takes to keep that running and the unit staying up.

Btw, is your unit currently stable?

If so can you share your CPU governor settings and any other tweaks you may have made?

Thanks!

Link to comment
Share on other sites

@TijuanaKez No problem. I'm sure it's possible to keep running Nextcloud on both Armbian or OMV. Though with OMV I'm guessing it's containerized.

 

I haven't changed my CPU settings from default, both from when I was running Buster, and right now as I'm running fresh Armbian Bookworm with no problems for a couple days now.

 

So I have to ask, are you using the backup battery? I am still using it on mine. Maybe that could affect stability, I don't know. I should probably start checking mine with a multi-meter at this point...

 

When you were building the NAS (I don't think it was possible to order these pre-built), you did make absolutely sure all the cables are seated properly to the board right? And none of the SATA cords or capacitors on them look damaged? Did you also do the "2.5 Gbps hardware fix"? If you don't know what this means, then don't worry about it. Most Kobol owners would know about it if they've done it.

 

EDIT: Oh yeah, I forgot to mention that I'm using the eMMC install. I know it's possible to try an SD card install. If anyone gets problems on SD card install, try the eMMC install, or vice versa, that may also be helpful in figuring out some of those problems.

 

Since Armbian Bookworm has a new kernel now, "6.1.36-rockchip64", you should give it a shot, and maybe it could help solve your problem if it's a software issue.

 

I just don't like that Armbian makes it slightly more difficult than Debian to store persistent logs to check software--or some hardware-related--rebooting problems, because you have to disable ZRAM logging in `/etc/default/armbian-ramlog`, and THEN you have to enable persistent logs with Systemd. So a two-step process. Haha.

Edited by mrjpaxton
Link to comment
Share on other sites

Hi,

I have also been testing Bookworm for a week.

I start with a clean install and use snapraid.

Before bookworm I use kernel 5.15 with Bullseye. To get a stable system I have to limit my cpu to 600>1200mhz on demand.

Now with bookworm and kernel 6.1.36 these settings are not enough. Every time I start a snapraid sync I get kernel ops.

I'm now back to kernel 5.15 with debian bookworm and it looks more stable (1 full sync successful) 

 

//edit.

system is still unstable.

i found a good fast test

for i in $(seq 1 100);do python3 -c "import pkg_resources" || break;done

 

give me a free() invalid pointer  issue

I found in this thread that there can be a problem with memory speed detection?

https://github.com/armbian/build/issues/4761

 

/edit 2

 

For me, it looks like the new uboot version from here https://github.com/armbian/build/issues/4761

Solved all my problems. Snapraid sync --force-full without any problems (ran 6 hours) and the python3 test (6 times) also without any errors.

and best of all, the helios are now running at full speed! 400>1800MHz on demand

this was not possible before the uboot update. if i understand correctly. the ddr memory now dects better with correct speed settings.

 

edit3

hahaha 

Rejoyed too soon

 

kernel:[47341.023705] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP

 

Message from syslogd@helios64 at Aug 22 10:59:53 ...

 kernel:[47341.023705] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP

 

Message from syslogd@helios64 at Aug 22 10:59:53 ...

 kernel:[47341.045273] Code: aa1c03e0 93407c62 2a0803e1 9400819e (a9408261) 

 

;(

 

 

Edited by snakekick
Link to comment
Share on other sites

@mrjpaxtonI defintely bulit the unit corretly. As I said in the post, I had a %100 stable unit on OMV5 with the image listed in the wiki for years.

Paying attention to this forum though, it's clear stability issues have been long running for many users and from what I can gather, the root problem is to do with cpu and power management. Quite a complex subject, something that PC overclockers would know more about than me.

@snakekick thanks for chiming in too. Good info.

Collecting info as I find it, it does seem like preventing the CPU from hitting the high end is the best way to go (contrary to other posts here).

Still running on-demand 400-14000 now on @bunducafe's info, and it's more or less stable. As it's only happened once so far since changing to that setting, I can't even confirm it wasn't some other insult to the system

 

Edited by TijuanaKez
Link to comment
Share on other sites

I really wish I knew why mine is more stable. But i don't know if it's because I don't use OMV, Snapraid, or whatever, but I may plan to use more stuff in the future.

 

If I could figure out how to extract some logs such as what U-Boot version I'm using, since I don't know if the installed "linux-u-boot-helios64-current" correlates to what the internal U-Boot is being used to boot up the system.

 

I also tested with this:

 

for i in $(seq 1 100);do python3 -c "import pkg_resources" || break; done

 

I am getting the "free() invalid pointer", and even "Segmentation fault" sometimes but my system does not freeze up at all after it, and nothing is showing up in the dmesg or journalctl logs that I think is relevant to it.

 

Let me know what I can share.

Link to comment
Share on other sites

Interesting that yours doesn't freeze up on Segmentation fault etc.

I also have a suspicion that the amount and kind of drives installed as this has an effect on power drawer and possibly integrity of the 12V and 5V rails.

Grounds for the suspicion:

  • Suggestion to raise VDD to 0.95V to improve stability as done with other Rockhip RK3399 boards
  • Preventing the higher CPU frequencies makes it more stable. High CPU freqs are more demanding on power.
  • The fact they shipped the unit with big electrolytic capacitors soldered to the end of wires on the Sata loom. Seems like a way to help smooth power rails without having to redesign the board.
  • Various other hints here on the subject of VRMs, cpu governor settings, ramp time, latency etc in relation to the  RK3399.

https://forum.odroid.com/viewtopic.php?t=30303

https://github.com/u-boot/u-boot/commit/f210cbc1f3d4f84baa595d83933369b4d6a529ea

https://github.com/u-boot/u-boot/commit/5a6d960b222203e60ab57e19b3eb7b41c24b564b

https://wiki.t-firefly.com/en/Firefly-RK3399/driver_pwm.html

http://patchwork.ozlabs.org/project/uboot/patch/20191128061433.1952869-2-anarsoul@gmail.com/

 

So, @mrjpaxton interested to know how many drives you have installed, and what type (5400, 7200, or SSD)?

Also a possibility is running fewer tasks on the unit means the CPU has less demand and the governor may by rarely, if at all, trying to switch into the higher problematic frequencies.

 

update: my unit still hasn't frozen again since last post with the 4000-14000 omdemand so things are looking pretty stable. so far so good anyway.

 

 

 

 

 

 

Edited by TijuanaKez
Link to comment
Share on other sites

I just wanted to add some information here in case it helps. I'm running OMV6 stable on my Helios64 - config below, and note one caveat from the last kernel update:

  • Kernel 6.1.50-current-rockchip64 (NOTE: Staying on 6.1.36 might be advised, network issues, see below)
  • Armbian 23.8.1 Bullseye (clean install to Bullseye, sdcard then migrated to emmc)
  • OMV 6.9.0-1 (Shaitan)
  • Governor set to 408Mhz to 1800Mhz, schedutil governor (I don't remember why I selected this vs. ondemand, somehow I had the impression that it was better for ARM systems)
  • 4 drives, WD Red 5400, 3x12TB, 1x4TB
  • I run three drives in a MergerFS pool w/ SnapRaid on a regular run, fourth on a separate share. Offsite backups to Borgbase, run weekly. The NAS gets moderate usage - it holds media files, Proxmox VM backups, and 2 TimeMachine shares.
  • No other random software or services, just OMV.

My system is stable, no unexpected crashes or issues when running the python stress test above, or my Snapraid syncs.

 

NOTE on the network issue with this kernel is that the most recent update had the effect of making the 2.5GB network interface (eth1, r8152 driver) unstable. It will run for a while, then randomly disappear. I only use it as a secondary path to my other server, so I was able to work around it fairly easily, but this would be a blocking issue if you relied on that. I had to build a kernel with an earlier version of the r8152 driver on another x86 machine with recent 6.1.x kernels, so I'm thinking it is the same/similar issue, but I haven't rolled back or done any serious troubleshooting yet. If you want to move to the 6.1.x branch, I'd recommend a good backup, and sticking with 6.1.36 for the moment.

 

EDIT: The issue seems to be the same as this user's, related to Bookworm 23.08.1. I'll try their recommended solution of moving up the r8152 driver and see how it goes. I will also freeze updates for the moment if it works (armbian-config -> System -> Freeze).

 

Edited by phidauex
added more info on possible solution
Link to comment
Share on other sites

Update: I appear to have jinxed myself - after poking around in the system and running other updates, I appear to be having instability as well. Somehow in the last few months I've gone from uptimes in the 30-50 days (basically only when I reboot after updates) to daily crashes. I reinstalled my OS, rolled back to 5.15.93-rockchip64, and reduced my CPU a bit to 408 to 1400Mhz, schedutil governor.

 

I also updated my fancontrol settings to let the fans run a little more (/etc/fancontrol), lowered MaxTemp to 90, and raised MinPWM to 40.

# Helios64 PWM Fan Control Configuration
# Temp source : /dev/thermal-cpu
INTERVAL=10
FCTEMPS=/dev/fan-p6/pwm1=/dev/thermal-cpu/temp1_input /dev/fan-p7/pwm1=/dev/thermal-cpu/temp1_input
MINTEMP=/dev/fan-p6/pwm1=40 /dev/fan-p7/pwm1=40
MAXTEMP=/dev/fan-p6/pwm1=90 /dev/fan-p7/pwm1=90
MINSTART=/dev/fan-p6/pwm1=60 /dev/fan-p7/pwm1=60
MINSTOP=/dev/fan-p6/pwm1=40 /dev/fan-p7/pwm1=40
MINPWM=40

 

Another possibility is that about a month ago I added a fourth drive, and that is when it looks like my uptimes start to get worse. That would lend some support to the power supply theory.

 

EDIT: Just crashed on a snapraid scrub, with the following kernel errors:

 

Message from syslogd@helios64 at Sep 26 11:36:37 ...
 kernel:[ 6347.577469] Internal error: Oops: 96000044 [#1] PREEMPT SMP

Message from syslogd@helios64 at Sep 26 11:36:37 ...
 kernel:[ 6347.604243] Code: 12800015 17fffff2 94390139 d503233f (a9be7bfd)

 

I might try disconnecting my 4th drive and see how things go.

Edited by phidauex
Link to comment
Share on other sites

Thanks for adding that info.

As for updates, I was actually thinking I was in the clear. The unit had stayed up without a hitch since my last post over a month ago and seemed impervious to putting it through it's paces.

That was with 5.15.93-rockchip64 and 400-14000 ondemand governor.

 

Sorry to say though I ran the auto update in the OMV6 backend that contained some Helios64 specific updates.

 

At the time I thought great, someone, somewhere has made some tweaks!

But now my unit is completely hosed. I can't even boot it to a state where I can ssh in and see what's up.

TBH, I don't really understand where these updates came from.

Are these coming from the community?

Or are the being autogenerated by the Armbian build process, perhaps with old and outdated code that contains issues because the Helios code is not maintained?

 

If so, anyone got tips on rolling back the kernel / updates? TBH I'm not even sure if the Helios64 specific updates were to do with the kernel or something else.

 

 

 

Link to comment
Share on other sites

Ouch, we are clearly suffering from the project having been abandoned for a while... Armbian updates are still coming, but without the mfg support there isn't the kind of checking that would probably be needed normally for reliability.

 

For restoring, I burned the newest Bullseye image from the archive (https://archive.armbian.com/helios64/archive) to an SD card and booted into that environment. Using a USB-C cable to another computer with a serial console open helps a lot. I then used armbian-config to roll back the kernel to 5.15.63-rockchip (System -> Other), then froze armbian updates.

 

From there, mount the emmc drive, and pull as much configuration backup as you can - backing up OMV is vexingly difficult, but save out:

  • /srv/salt/
  • /srv/pillar/
  • your samba conf (/etc/samba/smb.conf, I think)
  • Your NFS config (/etc/exports)
  • user files in /home/ and /root/
  • your /etc/fstab file

Once you've captured as much as you can, you can use armbian-config to install to the eMMC again, and reboot into the new clean environment. Reinstall OMV from their instructions for a bare Debian system.

 

In most cases you don't want do just copy/paste the backed up files into the new environment, since OMV prefers to generate this dynamically, but it is helpful as a resource - for instance I opened up the SMB conf file while setting my SMB shares up in the webUI to make sure I was getting the settings right.

 

Then backup your whole drive, the omv-backup plugin with fsarchiver is a good choice.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines