Jump to content

Helios64 - Armbian 23.08 Bookworm issues (solved)


Go to solution Solved by ebin-dev,

Recommended Posts

Posted
On 1/30/2025 at 12:15 PM, kantantana said:

But still, iperf3 says barely 200mbit/s, this is on the 1gbit interface. Another laptop towards same server, on same lan same switch gets 950mbit/s.

 

Since only one of your laptops is affected by the reduced transfer speed I would rather search for any issues there (update NIC driver, check NIC configuration, exclude cable/connection issues, etc.). 

<
Posted

I did the tests on helios64, from 2 different laptops and one desktop, between the desktops and laptops I always get 950mbit/s, it is only helios64 which gets close to 200mbit/s to any of them. They are all on the same switch and same lan. Even worse, for helios64 with iperf3 -u (udp) I get barely 1mbit/s.

 

I did another test today, this time checked cpufreq-info helios was on 480Mhz so maybe it was too slow to increase freq? So I did dd if=/dev/zero of=/dev/null bs=1M and let it run, it was burning 1 cpu core, helios went to 1.2Ghz, did the iperf3 test and same results, barely 200Mbit/s. ethtool does say 1000 full duplex on that interface. Also tried the ethtool -k tso gso gro on and so on, no difference.

 

The 2.5gbit/tests I did with a cable from the helios directly to an odroid h4+ which also has a 2.5gbit port, no switch inbetween at all, that test failed stability and the link was resetting on the helios64.

 

btw this is the sha256sum a538d69ca94b31fdf91d047ec7c54970465f7d008081738baaaa5fc09be97e94
 of the dtb /lib/linux-image-6.12.11-current-rockchip64/rockchip/rk3399-kobol-helios64.dtb Im using taken from this thread

 

Posted (edited)
On 1/31/2025 at 3:26 PM, kantantana said:

I did another test today, this time checked cpufreq-info helios was on 480Mhz so maybe it was too slow to increase freq? So I did dd if=/dev/zero of=/dev/null bs=1M and let it run, it was burning 1 cpu core, helios went to 1.2Ghz, did the iperf3 test and same results, barely 200Mbit/s. ethtool does say 1000 full duplex on that interface. Also tried the ethtool -k tso gso gro on and so on, no difference.

 

I have only checked the performance of the 2.5G interface with linux 6.12.10 - not the 1G interface. Could you check if the issue remains if you use linux 6.6.74 ? (install with 'dpkg -i linux*'; replace the dtb with the one attached; execute 'update-initramfs -u'; reboot)

 

Edit: The 2.5G interface needs the hardware fix in order to operate reliably in mode 2500baseT/Full too: during ethernet handshaking 1G speeds could be established transiently. I think that this might have happened during the test with odroid h4+.

 

Edit: You mentioned that helios went to 1.2GHz during the transfer. If the load is handled by a little core that may be the reason for the reduced transfer speed. A big core should be assigned to it. Therefore, the following code in /etc/rc.local is very helpful in this context (it also configures the ondemand governor and makes it responsive) [armbian-hardware-optimization is disabled on my system]:

 

cd /sys/devices/system/cpu/cpufreq

for cpufreqpolicy in 0 4 ; do
    echo 1 > policy${cpufreqpolicy}/ondemand/io_is_busy
    echo 25 > policy${cpufreqpolicy}/ondemand/up_threshold
    echo 10 > policy${cpufreqpolicy}/ondemand/sampling_down_factor
    echo $(cat policy${cpufreqpolicy}/cpuinfo_transition_latency) > policy${cpufreqpolicy}/ondemand/sampling_rate
done

for i in $(awk -F":" "/ahci/ {print \$1}" < /proc/interrupts | sed 's/\ //g'); do
        echo 10 > /proc/irq/$i/smp_affinity
done
for i in $(awk -F":" "/xhci/ {print \$1}" < /proc/interrupts | sed 's/\ //g'); do
        echo 20 > /proc/irq/$i/smp_affinity
done

exit 0

 

While transferring data to/from Helios64 some processes are involved such as iperf3 - or netatalk in my case. Those processes should also be executed on a big core. If iperf3 tests are performed or if helios64 is mounted as a remote drive I call the following script that assigns a big core and a preferred io scheduling class to those processes (to be adapted to your needs). This helps to make your nas_processes faster (kudos to @tkaiser) .

 

# cat /usr/local/bin/make_nas_processes_faster.sh
#!/bin/sh -e
for i in `pgrep "afpd|cnid|iperf3|netatalk"` ; do ionice -c1 -p $i ; taskset -c -p 4 $i; done #  >/dev/null 2>&1

 

rk3399-kobol-helios64.dtb-6.6.xx-L2-hs400-opp

Edited by ebin-dev
Posted

Checked the previous kernel as instructed, there is no difference, its not a regression. Setting affinity does raise it to 380Mbit/s.

However, I did play with running dd if zero of to null, simultaneously with iperf3, and dd hogging 1 cpu is making it lose half of throughput, pointing to the problem being with the cpu scheduling or whatever and not with interface/driver code?

 

Anyway, I found its good enough for now, I can achieve 800Mbit/s, but only sometimes and only if run taskset 1,2,3 iperf3, and not even every time of those 1,2,3 sometimes it has to be 4,5,6, but even the 1,2,3 times happen on retries (so it looks like 600Mbit and then 250 and then 600 again!?)

 

And so, after many more tests, I noticed the times when I get 800Mbit/s, and other times 120Mbit/s... Becuase if I set performance governor, its then more often I can end up in reproducing 800Mbit/s, with iperf3 -A 5, or 4, but not 3 or 2... so its only the fast cpus which can give good results, the others are even if performance governor and at 1.42Ghz, still suck at 260Mbit/s

Also, only with iperf3 -A 5 (affinity), I can get 0 packet-retry, other options I guess when the process is moved from cpu 0 to 1, is when retries happen, or when ondemand governor is on, if on slow cpus, retries happen when it switches freq?

 

 

 

Posted (edited)
11 hours ago, kantantana said:

Checked the previous kernel as instructed, there is no difference, its not a regression. Setting affinity does raise it to 380Mbit/s.

However, I did play with running dd if zero of to null, simultaneously with iperf3, and dd hogging 1 cpu is making it lose half of throughput, pointing to the problem being with the cpu scheduling or whatever and not with interface/driver code?

 

Thanks for testing. I think that the load distribution is indeed essential to get good transfer speeds and it seems that the scheduler needs some help to do it right. 

 

With the settings posted before the load on cpus0-3 is below 20% and the load on cpus4-5 is between 60% and 80% while downloading a 2GB file from Helios64 to a client with effectively 280MByte/s (on linux 6.6.74, ondemand governor, ext4 FS, transfer with netatalk 4.1.1). But Helios64 only has to access a single 4TB SSD formatted to ext4 in this case. 

 

You could try to move your sata load to cpu5 ('echo 20' for the ahci interrupt) and the network load to cpu4 ('echo 10' for the xhci interrupt) and see if that helps (cpu5 is usually less frequently occupied compared to cpu4).

 

# cat /etc/default/cpufrequtils 
ENABLE=true
MIN_SPEED=600000
MAX_SPEED=1800000
GOVERNOR=ondemand
Edited by ebin-dev
Posted

More observations and progress, I get 50MB/s when doing a dd if=fileFromNfs of=/dev/null bs=1M on non-helios, and helios is the one exporting the file over nfs (from zfs). But, when pinning the nfsd process to core 4 and other sysctl changes, I get 70Mb/s, yet not every time.

 

Other observations, with governor performance it is more stable, and higher speeds, ondemand governor introduces retries and trasmission errors, overall even a steady read of 2GB is still slower than having it run with performance governor.

 

The other change suggested by R1 after I asked it to compare this dtb settings concerning rtl8211e was a slight adjust ment of rx_delay and tx_delay values, to 0x23, which can be done with an overlay. I can be imagining after running so many tests, but the retry counts with iperf3 is not happening as often, and if pinning iperf3 to 4 or 5, even with ondemand it is more stable and get close to 900Mbit/s.

 

This is added to my rc.local now

echo 32768 > /sys/class/net/end0/queues/rx-0/rps_flow_cnt
echo 32768 > /proc/sys/net/core/rps_sock_flow_entries
for i in `pgrep "nfsd"` ; do ionice -c1 -p $i ; taskset -c -p 4 $i; done

 

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines