Helios64 - Armbian 23.08 Bookworm issues (solved)

ebin-dev · January 31

On 1/30/2025 at 12:15 PM, kantantana said:

But still, iperf3 says barely 200mbit/s, this is on the 1gbit interface. Another laptop towards same server, on same lan same switch gets 950mbit/s.

Since only one of your laptops is affected by the reduced transfer speed I would rather search for any issues there (update NIC driver, check NIC configuration, exclude cable/connection issues, etc.).

kantantana · January 31

I did the tests on helios64, from 2 different laptops and one desktop, between the desktops and laptops I always get 950mbit/s, it is only helios64 which gets close to 200mbit/s to any of them. They are all on the same switch and same lan. Even worse, for helios64 with iperf3 -u (udp) I get barely 1mbit/s.

I did another test today, this time checked cpufreq-info helios was on 480Mhz so maybe it was too slow to increase freq? So I did dd if=/dev/zero of=/dev/null bs=1M and let it run, it was burning 1 cpu core, helios went to 1.2Ghz, did the iperf3 test and same results, barely 200Mbit/s. ethtool does say 1000 full duplex on that interface. Also tried the ethtool -k tso gso gro on and so on, no difference.

The 2.5gbit/tests I did with a cable from the helios directly to an odroid h4+ which also has a 2.5gbit port, no switch inbetween at all, that test failed stability and the link was resetting on the helios64.

btw this is the sha256sum a538d69ca94b31fdf91d047ec7c54970465f7d008081738baaaa5fc09be97e94
of the dtb /lib/linux-image-6.12.11-current-rockchip64/rockchip/rk3399-kobol-helios64.dtb Im using taken from this thread

ebin-dev · January 31

On 1/31/2025 at 3:26 PM, kantantana said:

I did another test today, this time checked cpufreq-info helios was on 480Mhz so maybe it was too slow to increase freq? So I did dd if=/dev/zero of=/dev/null bs=1M and let it run, it was burning 1 cpu core, helios went to 1.2Ghz, did the iperf3 test and same results, barely 200Mbit/s. ethtool does say 1000 full duplex on that interface. Also tried the ethtool -k tso gso gro on and so on, no difference.

I have only checked the performance of the 2.5G interface with linux 6.12.10 - not the 1G interface. Could you check if the issue remains if you use linux 6.6.74 ? (install with 'dpkg -i linux*'; replace the dtb with the one attached; execute 'update-initramfs -u'; reboot)

Edit: The 2.5G interface needs the hardware fix in order to operate reliably in mode 2500baseT/Full too: during ethernet handshaking 1G speeds could be established transiently. I think that this might have happened during the test with odroid h4+.

Edit: You mentioned that helios went to 1.2GHz during the transfer. If the load is handled by a little core that may be the reason for the reduced transfer speed. A big core should be assigned to it. Therefore, the following code in /etc/rc.local is very helpful in this context (it also configures the ondemand governor and makes it responsive) [armbian-hardware-optimization is disabled on my system]:

cd /sys/devices/system/cpu/cpufreq

for cpufreqpolicy in 0 4 ; do
    echo 1 > policy${cpufreqpolicy}/ondemand/io_is_busy
    echo 25 > policy${cpufreqpolicy}/ondemand/up_threshold
    echo 10 > policy${cpufreqpolicy}/ondemand/sampling_down_factor
    echo $(cat policy${cpufreqpolicy}/cpuinfo_transition_latency) > policy${cpufreqpolicy}/ondemand/sampling_rate
done

for i in $(awk -F":" "/ahci/ {print \$1}" < /proc/interrupts | sed 's/\ //g'); do
        echo 10 > /proc/irq/$i/smp_affinity
done
for i in $(awk -F":" "/xhci/ {print \$1}" < /proc/interrupts | sed 's/\ //g'); do
        echo 20 > /proc/irq/$i/smp_affinity
done

exit 0

While transferring data to/from Helios64 some processes are involved such as iperf3 - or netatalk in my case. Those processes should also be executed on a big core. If iperf3 tests are performed or if helios64 is mounted as a remote drive I call the following script that assigns a big core and a preferred io scheduling class to those processes (to be adapted to your needs). This helps to make your nas_processes faster (kudos to @tkaiser) .

# cat /usr/local/bin/make_nas_processes_faster.sh
#!/bin/sh -e
for i in `pgrep "afpd|cnid|iperf3|netatalk"` ; do ionice -c1 -p $i ; taskset -c -p 4 $i; done #  >/dev/null 2>&1

rk3399-kobol-helios64.dtb-6.6.xx-L2-hs400-opp

Edited February 5 by ebin-dev

prahal · February 1

The 2.5Gbps hardware issue is when operating in 1Gbps mode only (2.5Gbps was always fine hardware wise) https://blog.kobol.io/2020/11/13/helios64-2-5g-ethernet-issue/

kantantana · February 2

Checked the previous kernel as instructed, there is no difference, its not a regression. Setting affinity does raise it to 380Mbit/s.

However, I did play with running dd if zero of to null, simultaneously with iperf3, and dd hogging 1 cpu is making it lose half of throughput, pointing to the problem being with the cpu scheduling or whatever and not with interface/driver code?

Anyway, I found its good enough for now, I can achieve 800Mbit/s, but only sometimes and only if run taskset 1,2,3 iperf3, and not even every time of those 1,2,3 sometimes it has to be 4,5,6, but even the 1,2,3 times happen on retries (so it looks like 600Mbit and then 250 and then 600 again!?)

And so, after many more tests, I noticed the times when I get 800Mbit/s, and other times 120Mbit/s... Becuase if I set performance governor, its then more often I can end up in reproducing 800Mbit/s, with iperf3 -A 5, or 4, but not 3 or 2... so its only the fast cpus which can give good results, the others are even if performance governor and at 1.42Ghz, still suck at 260Mbit/s

Also, only with iperf3 -A 5 (affinity), I can get 0 packet-retry, other options I guess when the process is moved from cpu 0 to 1, is when retries happen, or when ondemand governor is on, if on slow cpus, retries happen when it switches freq?

ebin-dev · February 3

11 hours ago, kantantana said:

Checked the previous kernel as instructed, there is no difference, its not a regression. Setting affinity does raise it to 380Mbit/s.

However, I did play with running dd if zero of to null, simultaneously with iperf3, and dd hogging 1 cpu is making it lose half of throughput, pointing to the problem being with the cpu scheduling or whatever and not with interface/driver code?

Thanks for testing. I think that the load distribution is indeed essential to get good transfer speeds and it seems that the scheduler needs some help to do it right.

With the settings posted before the load on cpus0-3 is below 20% and the load on cpus4-5 is between 60% and 80% while downloading a 2GB file from Helios64 to a client with effectively 280MByte/s (on linux 6.6.74, ondemand governor, ext4 FS, transfer with netatalk 4.1.1). But Helios64 only has to access a single 4TB SSD formatted to ext4 in this case.

You could try to move your sata load to cpu5 ('echo 20' for the ahci interrupt) and the network load to cpu4 ('echo 10' for the xhci interrupt) and see if that helps (cpu5 is usually less frequently occupied compared to cpu4).

# cat /etc/default/cpufrequtils 
ENABLE=true
MIN_SPEED=600000
MAX_SPEED=1800000
GOVERNOR=ondemand

Edited February 3 by ebin-dev

kantantana · February 3

More observations and progress, I get 50MB/s when doing a dd if=fileFromNfs of=/dev/null bs=1M on non-helios, and helios is the one exporting the file over nfs (from zfs). But, when pinning the nfsd process to core 4 and other sysctl changes, I get 70Mb/s, yet not every time.

Other observations, with governor performance it is more stable, and higher speeds, ondemand governor introduces retries and trasmission errors, overall even a steady read of 2GB is still slower than having it run with performance governor.

The other change suggested by R1 after I asked it to compare this dtb settings concerning rtl8211e was a slight adjust ment of rx_delay and tx_delay values, to 0x23, which can be done with an overlay. I can be imagining after running so many tests, but the retry counts with iperf3 is not happening as often, and if pinning iperf3 to 4 or 5, even with ondemand it is more stable and get close to 900Mbit/s.

This is added to my rc.local now

echo 32768 > /sys/class/net/end0/queues/rx-0/rps_flow_cnt
echo 32768 > /proc/sys/net/core/rps_sock_flow_entries
for i in `pgrep "nfsd"` ; do ionice -c1 -p $i ; taskset -c -p 4 $i; done

snakekick · February 26

Hello @prahal,

sorry if I'm annoying. 25.2 is out, did it work out that you could bring in your fixes?
many thanks

prahal · March 6

On 2/26/2025 at 7:37 PM, snakekick said:

sorry if I'm annoying. 25.2 is out, did it work out that you could bring in your fixes?

I landed the restoration of the eMMC hs400 support and the fix for the helios64-heartbeat-led.service that I had broken in a previous commit.

The upped voltages are not in.

Though these upped voltage seems to help a lot of people, they are not the endgame. Ie I tried setting all the CPU b voltages to max 1.2, and I believe I can still crash the CPUs b with my test case...

Also, I am onboarding on another board maintenance. Fixed an issue with a PL2303 USB serial adapter that it did not support 1.5Mbps but only 1.2Mbps....

In the meantime, I nailed down an issue on edge 6.14 that might also affect other version but not tested yet. I fixed it locally by setting the startup time for HDD rail a and rail b to have them not start at the same time by the kernel. Ie the upstream Linux kernel have the HDD rails defined, and it seems it restarts them when the kernel load thus draining too many amps on my board and I repeatedly had:

Internal error: synchronous external abort: 0000000096000210 [#1] PREEMPT SMP

(...)

[    3.890774] Call trace:                                                                               
[    3.891012]  rockchip_pcie_rd_conf+0xf4/0x1e8 (P)                                                     
[    3.891461]  pci_bus_read_config_dword+0x7c/0xdc                                                      
[    3.891906]  pci_bus_generic_read_dev_vendor_id+0x34/0x1b4
[    3.892428]  pci_scan_single_device+0xa0/0x108                                                        
[    3.892857]  pci_scan_slot+0x68/0x1c4     
[    3.893216]  pci_scan_child_bus_extend+0x44/0x2cc
[    3.893668]  pci_scan_bridge_extend+0x320/0x60c                                                       
[    3.894104]  pci_scan_child_bus_extend+0x1b8/0x2cc                                         
[    3.894564]  pci_scan_root_bus_bridge+0x64/0xe4                                                       
[    3.895000]  pci_host_probe+0x30/0x108                                                                
[    3.895367]  rockchip_pcie_probe+0x43c/0x5e4                                                          
[    3.895777]  platform_probe+0x68/0xdc

I cannot reproduce this PCIe failure after adding the 10 seconds delay that Kobol devs put in the u-boot between rail a and rail b startup. Though, 10 seconds seems a lot, but I have no clue why the Kobol team chose such a huge value.

snakekick · March 7

Hello @prahal
with your voltage fix I have an uptime of 158 days.
Without, maybe 3-4 days if it boots at all.
Your other fixes are also great. But they don't have the same impact by far. Even if the voltage is not a 100% fix in your eyes, for most people it is a game changer and enables operation in the first place, so I am surprised that you see so little priority there.
Please don't get me wrong, I'm very grateful for the fix, I just think it should be implemented first, as I think stable operation is more important than how an LED flashes

BipBip1981 · March 9

Hi,

I agree with Snakekick, the voltage patch is for me the best thing for the helios64 run stable.

Maybe it's not a perfect solution but:

"It is better to have a good solution in time than a perfect one too late."

Alex T · April 1

If the spec is correct the voltage accuracy of the PSU is ±5% and the output ripples is 120mVpp which is rather high for a good PSU. There are 2 condensers soldered to the HDD power cord to compensate for that and guard the HDD drives (kind of ugly, yet a cheap solution), but isn't it too much for the components of the PCB?

https://wiki.kobol.io/helios64/files/psu/YHY_12V_PSU_Specification.pdf

prahal · April 1

6 hours ago, Alex T said:

If the spec is correct the voltage accuracy of the PSU is ±5% and the output ripples is 120mVpp which is rather high for a good PSU. There are 2 condensers soldered to the HDD power cord to compensate for that and guard the HDD drives (kind of ugly, yet a cheap solution), but isn't it too much for the components of the PCB?

https://wiki.kobol.io/helios64/files/psu/YHY_12V_PSU_Specification.pdf

Thanks, @Alex T, for your hindsights.

Would plugging an ATX power supply instead of the original external power supply help sort this out?

What kind of ATX PSU would be required?

I had measured the voltage on the board with a voltmeter and got a little above 12V fine, but I believe to phase out transient voltage drops an oscilloscope is required (I got a portable oscilloscope a few months ago).

Also, I have a test case (attached cpufreq-switching-2.c) that always crashes the big CPUs (even when I set a 5 milliseconds delay between transitions, with #define TRANSITION_DELAY 5000).

I believe if the power supply is at fault, I should be able to see small voltage drops on the 12V rail on the board with the oscilloscope?

cpufreq-switching-2.c

TDCroPower · May 14

my bookworm installation is now over 1 year ago, has there been any further optimization in that time that should be adopted?
Which setup combination is currently the most stable?

Armbian 23.08.0-trunk Bookworm with Linux 6.6.8-edge-rockchip64
user@helios64:~# uname -a
Linux helios64 6.6.8-edge-rockchip64 #1 SMP PREEMPT Wed Dec 20 16:02:07 UTC 2023 aarch64 GNU/Linux

I have been successfully connecting my Helios64 to a 2.5G Unifi switch for some time now and recently noticed that it kept interrupting the transfer when transferring files via 1G client.

I then deactivated the scatter/gather offloading option, but did not yet adjust the /etc/rc.local as described here...

greg396 · May 24

Is there a way to copy the system from sda1 [M2-Sata SSD (2280)] to emmc (uboot) from the running system without losing the configuration? If it's possible, how can I do this including configuring the changes?

What is the maximum size for the M2-Sata SSD (2280)?

https://wiki.kobol.io/helios64/m2/

Sign In

Helios64 - Armbian 23.08 Bookworm issues (solved)

Recommended Posts

ebin-dev

kantantana

ebin-dev

prahal

kantantana

ebin-dev

kantantana

snakekick

prahal

snakekick

BipBip1981

Alex T

prahal

TDCroPower

greg396

Join the conversation

Similar Content

Forums

My Activity Streams

Download

Store

Important Information