Jump to content

Helios64 - Armbian 23.08 Bookworm issues (solved)


Go to solution Solved by ebin-dev,

Recommended Posts

Posted

@ebin-dev  Thanks for the explanation (i failed to notice that you were setting io_is_busy to a different value).  I know that you are still working to stabilize this board.  But at the end of this process, it should be possible to submit a PR to incorporate your settings into the base builds.  Thus I'm looking for things that will make that difficult.  I just added a comment to PR6507 stating that the way io_is_busy is now set doesn't allow it to be overridden if needed.

Posted (edited)
On 5/6/2024 at 10:52 PM, SteeMan said:

@ebin-dev  Thanks for the explanation (i failed to notice that you were setting io_is_busy to a different value).  I know that you are still working to stabilize this board.  But at the end of this process, it should be possible to submit a PR to incorporate your settings into the base builds.  Thus I'm looking for things that will make that difficult.  I just added a comment to PR6507 stating that the way io_is_busy is now set doesn't allow it to be overridden if needed.

 

Thanks for the initiative with the PR. 

 

Here is what happens:

The cpufreqpolicy sets a generic value for the sampling_rate of 200000 (this is how often the governor’s worker routine should run, in microseconds). The sampling_rate should be set to a nominal value close to the cpuinfo_transition_latency (49000 and 40000 nanoseconds for the clusters of rk3399) such that it is effectively about 1000 times as large. 

For rk3399 boards the cluster sampling_rates of 49000 (big) and 40000 (little) are very responsive and not too demanding for the cpus. In that case no problems occur if io_is_busy is set to 1 !

 

I added a comment to PR6507 and requested that the sampling_rate should be a parameter (per cpu cluster) and be set to the value of the cpuinfo_transition_latency of that cpu.

 

The fine tuned values for Helios64 are those (io_is_busy set to 1 again, sampling_rate set to more responsive values):

 

for cpufreqpolicy in 0 4 ; do
    echo 1 > /sys/devices/system/cpu/cpufreq/policy${cpufreqpolicy}/ondemand/io_is_busy
    echo 25 > /sys/devices/system/cpu/cpufreq/policy${cpufreqpolicy}/ondemand/up_threshold
    echo 10 > /sys/devices/system/cpu/cpufreq/policy${cpufreqpolicy}/ondemand/sampling_down_factor
    echo $(cat /sys/devices/system/cpu/cpufreq/policy${cpufreqpolicy}/cpuinfo_transition_latency) > /sys/devices/system/cpu/cpufreq/policy${cpufreqpolicy}/ondemand/sampling_rate
done

 

Edit: With these changes the timeout issues of the 2.5G interface are resolved for 6.6.29 and 6.6.30 too.

Helios64 is one of the best arm based NAS systems I have seen so far. I am not going to replace it anytime soon.

Edited by ebin-dev
Posted (edited)

Hi,

i use on my own build from scratch with framework official Armbian 24.05 with kernel 6.6.30 and DTB file send by ebin-dev here with 408-1800mhz ondemand governor and for the first time since I have it, it is stable and usable with my case of uses and it pass my pattern stability test !

Many thanks to ebin-dev for the great work.

 

Edited by BipBip1981
  • Solution
Posted (edited)

There are new Armbian 24.05 images available on the Helios64 download page: both images Bookworm minimal and Jammy Desktop are based on linux 6.6.30 (download them !).

 

Again, the rtl_nic firmware (in /lib/firmware/rtl_nic) should be replaced by the version downloaded from git.kernel.org, such that the 2.5G LAN interface works correctly.

 

I would also recommend to copy the dtb attached below to /boot/dtb/rockchip/rk3399-kobol-helios64.dtb (execute 'update-initramfs -u' after that). It includes the 75mV bump of the opp states for the fast cores as suggested by @prahal, it enables the L2 cache info and it enables hs400 speed on emmc again. In particular the 75mV bump has a very positive effect on stability.

 

The bootloader that comes with it would appear to contain the Rockchip DDR blob. It should be fine. If you have an issue with u-boot, just flash linux-u-boot-edge-helios64_22.02.1_arm64 as recommended before.

 

The cpufreq ondemand governor is still the best choice. Good settings are

# cat /etc/default/cpufrequtils 
ENABLE=true
MIN_SPEED=600000
MAX_SPEED=1800000
GOVERNOR=ondemand

 

Enjoy.

 

P.S.: If you like a system more responsive to server tasks or push the 2.5G interface to the limits, some fine tuning is helpful.

 

rk3399-kobol-helios64.dtb-6.6.30-L2-hs400-opp

Edited by ebin-dev
Posted (edited)

Thank you @ebin-dev for the dtb file & @prahal for figuring out what changes to make.
I just reinstalled my NAS with the 6.6.30 kernel and the bookwork image. The dtb file is the only change I've made to configuration.
Haven't had any issues yet - so far recompiling ZFS KMOD 2.2.4 to run with new kernel hasn't crashed - MAX CPU freq. for the BIG cores is 1.8GHz

NFS and SMB are working fine. sbc-bench is stable as well.

Edited by SIGSEGV
Posted (edited)

Thank you @ebin-dev same here, 6.6.29 (much more) stable with your dtb

I know we are all the community and can make changes but... is it possible for you to make a ticket/change request or as its called to add your changes to the main armbian image? 

so that we have a good running Helios out of the box? ;)

That would be fantastic! 

Edited by snakekick
Posted

Actually the changes to the dtb were figured out and were proposed by @prahal and therefore it would be up to Prahal to submit a PR.

 

From my perspective there is nothing against a PR right now as the changes are sufficiently tested and have a very positive effect on stability.

Posted

Thanks guys, 

Last time I used Armbian_23.5.4 with 5.15 kernel , and I had about 60 days of uptime. Today I switched for Armbian 24.05 with DTB patched . 

Im using ZFS so only what Im do, is working with older packages, and do a hold with future updates

 

I have SMB and NFS configured, I ran some tests and everything looks very good. 

Even my power consumption is down (I have 2x WD RED 4TB + Exos 18TB + WD RED 14TB + 1 SSD) and now its about 23 W

 

Currently I have 'previous' values of cpu freq: (to have some compare)

 

root@helios64:~# cat /etc/default/cpufrequtils

ENABLE=true

MIN_SPEED=408000

MAX_SPEED=1200000

GOVERNOR=ondemand

 

image.png.4e84734efa79a989f92671f26054d308.png

Posted

@ebin-dev - one quick question.
Applying this script at boot should change the sampling rate to the desired values?
 

for cpufreqpolicy in 0 4 ; do
    echo 1 > /sys/devices/system/cpu/cpufreq/policy${cpufreqpolicy}/ondemand/io_is_busy
    echo 25 > /sys/devices/system/cpu/cpufreq/policy${cpufreqpolicy}/ondemand/up_threshold
    echo 10 > /sys/devices/system/cpu/cpufreq/policy${cpufreqpolicy}/ondemand/sampling_down_factor
    echo $(cat /sys/devices/system/cpu/cpufreq/policy${cpufreqpolicy}/cpuinfo_transition_latency) > /sys/devices/system/cpu/cpufreq/policy${cpufreqpolicy}/ondemand/sampling_rate
done

 

Posted (edited)

Due to its current status I have disabled armbian-hardware-optimization (some 'volunteers' are needed to fix it) and run the following code in /etc/rc.local instead. 

Alternatively you could seek the line 'echo 200000' in armbian-hardware-optimization and change the lines around it accordingly.

@SIGSEGV Yes - thereby you set the sampling_rates per cpu cluster to 40000 and 51000 on the current kernel (actually by the line starting with 'echo $(cat ...' ).

 

#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "exit 0" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution
# bits.
#
# By default this script does nothing.

cd /sys/devices/system/cpu/cpufreq

for cpufreqpolicy in 0 4 ; do
    echo 1 > policy${cpufreqpolicy}/ondemand/io_is_busy
    echo 25 > policy${cpufreqpolicy}/ondemand/up_threshold
    echo 10 > policy${cpufreqpolicy}/ondemand/sampling_down_factor
    echo $(cat policy${cpufreqpolicy}/cpuinfo_transition_latency) > policy${cpufreqpolicy}/ondemand/sampling_rate
done

for i in $(awk -F":" "/ahci/ {print \$1}" < /proc/interrupts | sed 's/\ //g'); do
        echo 30 > /proc/irq/$i/smp_affinity
done
for i in $(awk -F":" "/xhci/ {print \$1}" < /proc/interrupts | sed 's/\ //g'); do
        echo 20 > /proc/irq/$i/smp_affinity
done

exit 0
Edited by ebin-dev
Posted (edited)
18 hours ago, grek said:

Currently I have 'previous' values of cpu freq: (to have some compare)

 

root@helios64:~# cat /etc/default/cpufrequtils

ENABLE=true

MIN_SPEED=408000

MAX_SPEED=1200000

GOVERNOR=ondemand

 

Just a remark: setting MIN_SPEED to 600 MHz would have NO effect on power consumption but required frequency switching would be less:  switching between 408 and 600MHz would be avoided and the system could transition directly from 600 MHz to the highest frequency (essentially switching only between two states for each cpu).

 

The huge power savings are impressive!

Edited by ebin-dev
Posted (edited)

Sorry - I am probably the only one who is interested in network performance of the 2.5G interface (theoretical max 2.35Gbit/s).

(iperf 3.17.1 measurements attached)

 

Spoiler
# ./iperf3 -c 192.168.xx.30 -p 5201 (client -> helios64)
Connecting to host 192.168.xx.30, port 5201
[  5] local 192.168.xx.54 port 54011 connected to 192.168.xx.30 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.01   sec   284 MBytes  2.37 Gbits/sec                  
[  5]   1.01-2.01   sec   280 MBytes  2.35 Gbits/sec                  
[  5]   2.01-3.01   sec   281 MBytes  2.36 Gbits/sec                  
[  5]   3.01-4.00   sec   278 MBytes  2.34 Gbits/sec                  
[  5]   4.00-5.01   sec   280 MBytes  2.34 Gbits/sec                  
[  5]   5.01-6.01   sec   281 MBytes  2.36 Gbits/sec                  
[  5]   6.01-7.01   sec   280 MBytes  2.35 Gbits/sec                  
[  5]   7.01-8.01   sec   280 MBytes  2.35 Gbits/sec                  
[  5]   8.01-9.01   sec   281 MBytes  2.36 Gbits/sec                  
[  5]   9.01-10.01  sec   281 MBytes  2.35 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.01  sec  2.74 GBytes  2.35 Gbits/sec                  sender
[  5]   0.00-10.01  sec  2.74 GBytes  2.35 Gbits/sec                  receiver

iperf Done.

# ./iperf3 -c 192.168.xx.30 -p 5201 -R (helios64 -> client)
Connecting to host 192.168.xx.30, port 5201
Reverse mode, remote host 192.168.xx.30 is sending
[  5] local 192.168.xx.54 port 54013 connected to 192.168.xx.30 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.01   sec   276 MBytes  2.30 Gbits/sec                  
[  5]   1.01-2.01   sec   280 MBytes  2.35 Gbits/sec                  
[  5]   2.01-3.00   sec   280 MBytes  2.35 Gbits/sec                  
[  5]   3.00-4.01   sec   280 MBytes  2.35 Gbits/sec                  
[  5]   4.01-5.01   sec   280 MBytes  2.35 Gbits/sec                  
[  5]   5.01-6.00   sec   280 MBytes  2.35 Gbits/sec                  
[  5]   6.00-7.01   sec   281 MBytes  2.35 Gbits/sec                  
[  5]   7.01-8.00   sec   280 MBytes  2.35 Gbits/sec                  
[  5]   8.00-9.01   sec   281 MBytes  2.35 Gbits/sec                  
[  5]   9.01-10.01  sec   280 MBytes  2.35 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.01  sec  2.74 GBytes  2.35 Gbits/sec   16             sender
[  5]   0.00-10.01  sec  2.73 GBytes  2.35 Gbits/sec                  receiver

iperf Done

./iperf3 -c 192.168.xx.30 -p 5201 --bidir (bidirectional)
Connecting to host 192.168.xx.30, port 5201
[  5] local 192.168.xx.54 port 49681 connected to 192.168.xx.30 port 5201
[  7] local 192.168.xx.54 port 49682 connected to 192.168.xx.30 port 5201
[ ID][Role] Interval           Transfer     Bitrate
[  5][TX-C]   0.00-1.01   sec   252 MBytes  2.11 Gbits/sec                  
[  7][RX-C]   0.00-1.01   sec   206 MBytes  1.72 Gbits/sec                  
[  5][TX-C]   1.01-2.01   sec   252 MBytes  2.11 Gbits/sec                  
[  7][RX-C]   1.01-2.01   sec   207 MBytes  1.74 Gbits/sec                  
[  5][TX-C]   2.01-3.01   sec   244 MBytes  2.05 Gbits/sec                  
[  7][RX-C]   2.01-3.01   sec   213 MBytes  1.78 Gbits/sec                  
[  5][TX-C]   3.01-4.01   sec   196 MBytes  1.65 Gbits/sec                  
[  7][RX-C]   3.01-4.01   sec   223 MBytes  1.87 Gbits/sec                  
[  5][TX-C]   4.01-5.01   sec   221 MBytes  1.86 Gbits/sec                  
[  7][RX-C]   4.01-5.01   sec   217 MBytes  1.82 Gbits/sec                  
[  5][TX-C]   5.01-6.01   sec   206 MBytes  1.73 Gbits/sec                  
[  7][RX-C]   5.01-6.01   sec   223 MBytes  1.87 Gbits/sec                  
[  5][TX-C]   6.01-7.01   sec   214 MBytes  1.80 Gbits/sec                  
[  7][RX-C]   6.01-7.01   sec   220 MBytes  1.84 Gbits/sec                  
[  5][TX-C]   7.01-8.01   sec   215 MBytes  1.80 Gbits/sec                  
[  7][RX-C]   7.01-8.01   sec   217 MBytes  1.82 Gbits/sec                  
[  5][TX-C]   8.01-9.01   sec   187 MBytes  1.57 Gbits/sec                  
[  7][RX-C]   8.01-9.01   sec   224 MBytes  1.88 Gbits/sec                  
[  5][TX-C]   9.01-10.01  sec   206 MBytes  1.73 Gbits/sec                  
[  7][RX-C]   9.01-10.01  sec   218 MBytes  1.83 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-10.01  sec  2.14 GBytes  1.84 Gbits/sec                  sender
[  5][TX-C]   0.00-10.01  sec  2.14 GBytes  1.84 Gbits/sec                  receiver
[  7][RX-C]   0.00-10.01  sec  2.12 GBytes  1.82 Gbits/sec   13             sender
[  7][RX-C]   0.00-10.01  sec  2.12 GBytes  1.82 Gbits/sec                  receiver

 

 

 

 

Edited by TRS-80
move long content inside spoiler
Posted

I found ,that in the logs I have some ATA errors ...

Could u pls check yours ? 

uptime is 14 days, and it is 'stable' .... 

 

Linux helios64 6.6.30-current-rockchip64 #1 SMP PREEMPT Thu May  2 14:32:50 UTC 2024 aarch64 GNU/Linux

Zrzut ekranu 2024-06-3 o 12.56.53.png

Posted

Hi,

 

I had this problem in past, you may try to unrack and rack hardisk or move it if you not use all rack.

Problem to my side was bad contact in SATA connector.

 

Bye

Posted

Checking that note reveals:

 

Note: at the current status, helios64 board patches have been disabled because the base device tree does not apply on kernel 6.9.

 

So helios64 is cancelled as of linux 6.9.x until someone takes care of the issue. Thank you very much.

Posted (edited)

Started work on syncing the helios64 dts to upstream for 6.9: https://github.com/prahal/build/tree/helios64-6.9 .

 

I removed the overclock disabling patch as the overclock as it disables an overclock that is at least nowadays not in the included rk3399-opp.dtsi (ie cluster0 has no opp6 and cluster no opp8). It was not a high priority beforehand but as the helios64 dts starts to change, thus carries unnecessary work.

 

The pachtset applies. The helios64 dts compile fine to dtb.

Kernel built and booted. (see below for network connection, ie ethernet MAC fixup)

 

There are not many functional changes on my side (there are in upstream dts). There are a few differences with upstream dts I did not bring as I don't know if they are leftover from the initial patch set or new fixups. But this could already have been an issue before 3.9 for most of these changes (there is at least a new upstream change 93b36e1d3748c352a70c69aa378715e6572e51d1  "arm64: dts: rockchip: Fix USB interface compatible string on kobol-helios64") I brought forward.

I also brought in vcc3v0_sd node "enable-active-high;"  and "gpio = <&gpio0 RK_PA1 GPIO_ACTIVE_HIGH>;".

 

 

Beware.

ethernet MAC change (now working as designed). The fact I kept the aliases from upstream fixes the ability for the eth0 (then renamed to end0 by armbian-hardware-optimization) ethernet mac - grabbed from OTP via SPI in u-boot to be applied. Thus if you bound the MAC address that was generated by the kernel instead of from the hardware to a host and IP, you will have to find the new IP assigned by the DHCP server to connect to the helios64.

I believe this is fine for edge even if not for current (so should be good for 6.9?).

Edited by prahal
warn about MAC ethernet change
Posted (edited)

@prahal Current kernel 6.9.6 works just fine on Helios64 (headless) thanks to your PR. I am still using linux-u-boot-edge-helios64_22.02.1_arm64.

 

Level2 cache-info is now included:

# lscpu -C
NAME ONE-SIZE ALL-SIZE WAYS TYPE        LEVEL SETS PHY-LINE COHERENCY-SIZE
L1d       32K     192K    4 Data            1  128                      64
L1i       32K     224K    2 Instruction     1  256                      64
L2       512K     1.5M   16 Unified         2  512                      64

 

The modified dtb (hs400,  modified opp-table-1) is attached below (copy to /boot/dtb/rockchip/rk3399-kobol-helios64.dtb ; execute 'update-initramfs -u').

 

P.S.: sata performance needs to be addressed and helios64-heartbeat-led.service is not working

rk3399-kobol-helios64.dtb-6.9.6-L2-hs400-opp

Edited by ebin-dev
Posted (edited)
On 7/25/2024 at 9:06 PM, rjgould said:

I thought I was one of the only one out here still making use of one of these.  Amazing to see it back in the supported section, thank you @prahal 👏

 

I don't even own one of these, but I am fascinated by how you guys finally got it stable it sounds like?  After all these years.  And the fact that Kobol (our once partner) went out of business years ago (I just looked and we are about a month shy of 3 years now).

 

Such triumph of the human will (and this project!), it could almost warm an old cynics heart.  :)

 

EDIT:

 

I updated (removed) the "instability" comments about Helios64 in my NAS article: So, you want to run a file server (aka NAS)?

 

Edited by TRS-80
linked article update
Posted

So I don't know about everyone else, but for myself I've had my helios64 running on Armbian 22.02 with the 5.15.93-rockchip64 kernel for some time.

 

Mine is an on-lan ZFS replication target for my main NAS, i.e. overnight low power local part of my backup strategy.  I always knew it was not performing to it's full potential but it doesn't fall over.

 

The biggest challenge I had was getting OpenMediaVault 6 on it with the ZFS plugins from the extras repository.  The upgrades around those broke so often I ended up learning Ansible so I could script the steps I was doing by hand.

 

Also, because of the quirks and the "if it breaks, it's going to be painful" situation of it all, I stuck to running from the SD card, which I have an image of at a stable point on my main NAS.

 

Every once in a while I would try to move it all forwards to newer versions, when it doesn't go so well I flash the SD card from the image and start reading the forums, which is what brought me back here 😅

Posted

Oh wow, been a while since I checked this thread. Apparently I'm still on a 2023 build.

 

If I fully upgrade (via eMMC reflash) to Armbian 24.5.3 with the 6.6.36 kernel, will I still need the modified DTB file for improved stability, or has the DTB already been patched?

Posted (edited)
On 8/2/2024 at 7:06 AM, mrjpaxton said:

If I fully upgrade (via eMMC reflash) to Armbian 24.5.3 with the 6.6.36 kernel, will I still need the modified DTB file for improved stability, or has the DTB already been patched?

 

It seems that the patches are still missing. Until this is fixed you can use my dtb attached for the kernel 6.6.x branch (just copy it to /boot/dtb/rockchip/rk3399-kobol-helios64.dtb).

 

The linux files (linux-image, linux-header, linux-dtb) can be downloaded from beta.armbian.com (to be installed with 'dpkg -i linux*). Copy the attached 6.6 dtb to its location and run 'update-initramfs -u' and reboot. Thats it.

 

 I am on 6.10.2 right now - I also include the dtb for the 6.10 branch below. 

 

rk3399-kobol-helios64.dtb-6.6.34-L2-hs400-opp

rk3399-kobol-helios64.dtb-6.10.2-L2-hs400-opp

Edited by ebin-dev
Posted

mind the voltage changes I am uneasy to push to main until I either get no more crashes for a long long time or the issue is nailed down to its cause.

I might revisit this choice.

I plan to have the emmc hs400 fix in Armbian and vanilla linux. Probably not for the august Armbian release but I expect for the next one.

Posted (edited)
Quote

mind the voltage changes I am uneasy to push to main until I either get no more crashes for a long long time or the issue is nailed down to its cause.

I might revisit this choice.

I plan to have the emmc hs400 fix in Armbian and vanilla linux. Probably not for the august Armbian release but I expect for the next one.

 

Yes, I have also experienced a crash without the patch. It crashed when I tried to attach a HDD hard drive with a SATA-to-USB3 device for backup (I made sure the AC adapter was used for it), so it is more "stable" and responsive than before, but not completely yet. Would it be helpful to disable Armbian's ramlog and get journald to write any dumps to storage so that I can check the logs last boot, or are you guys already aware of *all* of the issues that cause these random crashes/freezes?

Edited by mrjpaxton

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines