ebin-dev Posted January 31 Author Posted January 31 On 1/30/2025 at 12:15 PM, kantantana said: But still, iperf3 says barely 200mbit/s, this is on the 1gbit interface. Another laptop towards same server, on same lan same switch gets 950mbit/s. Since only one of your laptops is affected by the reduced transfer speed I would rather search for any issues there (update NIC driver, check NIC configuration, exclude cable/connection issues, etc.). 0 Quote
kantantana Posted January 31 Posted January 31 I did the tests on helios64, from 2 different laptops and one desktop, between the desktops and laptops I always get 950mbit/s, it is only helios64 which gets close to 200mbit/s to any of them. They are all on the same switch and same lan. Even worse, for helios64 with iperf3 -u (udp) I get barely 1mbit/s. I did another test today, this time checked cpufreq-info helios was on 480Mhz so maybe it was too slow to increase freq? So I did dd if=/dev/zero of=/dev/null bs=1M and let it run, it was burning 1 cpu core, helios went to 1.2Ghz, did the iperf3 test and same results, barely 200Mbit/s. ethtool does say 1000 full duplex on that interface. Also tried the ethtool -k tso gso gro on and so on, no difference. The 2.5gbit/tests I did with a cable from the helios directly to an odroid h4+ which also has a 2.5gbit port, no switch inbetween at all, that test failed stability and the link was resetting on the helios64. btw this is the sha256sum a538d69ca94b31fdf91d047ec7c54970465f7d008081738baaaa5fc09be97e94 of the dtb /lib/linux-image-6.12.11-current-rockchip64/rockchip/rk3399-kobol-helios64.dtb Im using taken from this thread 0 Quote
ebin-dev Posted January 31 Author Posted January 31 (edited) On 1/31/2025 at 3:26 PM, kantantana said: I did another test today, this time checked cpufreq-info helios was on 480Mhz so maybe it was too slow to increase freq? So I did dd if=/dev/zero of=/dev/null bs=1M and let it run, it was burning 1 cpu core, helios went to 1.2Ghz, did the iperf3 test and same results, barely 200Mbit/s. ethtool does say 1000 full duplex on that interface. Also tried the ethtool -k tso gso gro on and so on, no difference. I have only checked the performance of the 2.5G interface with linux 6.12.10 - not the 1G interface. Could you check if the issue remains if you use linux 6.6.74 ? (install with 'dpkg -i linux*'; replace the dtb with the one attached; execute 'update-initramfs -u'; reboot) Edit: The 2.5G interface needs the hardware fix in order to operate reliably in mode 2500baseT/Full too: during ethernet handshaking 1G speeds could be established transiently. I think that this might have happened during the test with odroid h4+. Edit: You mentioned that helios went to 1.2GHz during the transfer. If the load is handled by a little core that may be the reason for the reduced transfer speed. A big core should be assigned to it. Therefore, the following code in /etc/rc.local is very helpful in this context (it also configures the ondemand governor and makes it responsive) [armbian-hardware-optimization is disabled on my system]: cd /sys/devices/system/cpu/cpufreq for cpufreqpolicy in 0 4 ; do echo 1 > policy${cpufreqpolicy}/ondemand/io_is_busy echo 25 > policy${cpufreqpolicy}/ondemand/up_threshold echo 10 > policy${cpufreqpolicy}/ondemand/sampling_down_factor echo $(cat policy${cpufreqpolicy}/cpuinfo_transition_latency) > policy${cpufreqpolicy}/ondemand/sampling_rate done for i in $(awk -F":" "/ahci/ {print \$1}" < /proc/interrupts | sed 's/\ //g'); do echo 10 > /proc/irq/$i/smp_affinity done for i in $(awk -F":" "/xhci/ {print \$1}" < /proc/interrupts | sed 's/\ //g'); do echo 20 > /proc/irq/$i/smp_affinity done exit 0 While transferring data to/from Helios64 some processes are involved such as iperf3 - or netatalk in my case. Those processes should also be executed on a big core. If iperf3 tests are performed or if helios64 is mounted as a remote drive I call the following script that assigns a big core and a preferred io scheduling class to those processes (to be adapted to your needs). This helps to make your nas_processes faster (kudos to @tkaiser) . # cat /usr/local/bin/make_nas_processes_faster.sh #!/bin/sh -e for i in `pgrep "afpd|cnid|iperf3|netatalk"` ; do ionice -c1 -p $i ; taskset -c -p 4 $i; done # >/dev/null 2>&1 rk3399-kobol-helios64.dtb-6.6.xx-L2-hs400-opp Edited February 5 by ebin-dev 0 Quote
prahal Posted February 1 Posted February 1 The 2.5Gbps hardware issue is when operating in 1Gbps mode only (2.5Gbps was always fine hardware wise) https://blog.kobol.io/2020/11/13/helios64-2-5g-ethernet-issue/ 0 Quote
kantantana Posted February 2 Posted February 2 Checked the previous kernel as instructed, there is no difference, its not a regression. Setting affinity does raise it to 380Mbit/s. However, I did play with running dd if zero of to null, simultaneously with iperf3, and dd hogging 1 cpu is making it lose half of throughput, pointing to the problem being with the cpu scheduling or whatever and not with interface/driver code? Anyway, I found its good enough for now, I can achieve 800Mbit/s, but only sometimes and only if run taskset 1,2,3 iperf3, and not even every time of those 1,2,3 sometimes it has to be 4,5,6, but even the 1,2,3 times happen on retries (so it looks like 600Mbit and then 250 and then 600 again!?) And so, after many more tests, I noticed the times when I get 800Mbit/s, and other times 120Mbit/s... Becuase if I set performance governor, its then more often I can end up in reproducing 800Mbit/s, with iperf3 -A 5, or 4, but not 3 or 2... so its only the fast cpus which can give good results, the others are even if performance governor and at 1.42Ghz, still suck at 260Mbit/s Also, only with iperf3 -A 5 (affinity), I can get 0 packet-retry, other options I guess when the process is moved from cpu 0 to 1, is when retries happen, or when ondemand governor is on, if on slow cpus, retries happen when it switches freq? 0 Quote
ebin-dev Posted February 3 Author Posted February 3 (edited) 11 hours ago, kantantana said: Checked the previous kernel as instructed, there is no difference, its not a regression. Setting affinity does raise it to 380Mbit/s. However, I did play with running dd if zero of to null, simultaneously with iperf3, and dd hogging 1 cpu is making it lose half of throughput, pointing to the problem being with the cpu scheduling or whatever and not with interface/driver code? Thanks for testing. I think that the load distribution is indeed essential to get good transfer speeds and it seems that the scheduler needs some help to do it right. With the settings posted before the load on cpus0-3 is below 20% and the load on cpus4-5 is between 60% and 80% while downloading a 2GB file from Helios64 to a client with effectively 280MByte/s (on linux 6.6.74, ondemand governor, ext4 FS, transfer with netatalk 4.1.1). But Helios64 only has to access a single 4TB SSD formatted to ext4 in this case. You could try to move your sata load to cpu5 ('echo 20' for the ahci interrupt) and the network load to cpu4 ('echo 10' for the xhci interrupt) and see if that helps (cpu5 is usually less frequently occupied compared to cpu4). # cat /etc/default/cpufrequtils ENABLE=true MIN_SPEED=600000 MAX_SPEED=1800000 GOVERNOR=ondemand Edited February 3 by ebin-dev 0 Quote
kantantana Posted February 3 Posted February 3 More observations and progress, I get 50MB/s when doing a dd if=fileFromNfs of=/dev/null bs=1M on non-helios, and helios is the one exporting the file over nfs (from zfs). But, when pinning the nfsd process to core 4 and other sysctl changes, I get 70Mb/s, yet not every time. Other observations, with governor performance it is more stable, and higher speeds, ondemand governor introduces retries and trasmission errors, overall even a steady read of 2GB is still slower than having it run with performance governor. The other change suggested by R1 after I asked it to compare this dtb settings concerning rtl8211e was a slight adjust ment of rx_delay and tx_delay values, to 0x23, which can be done with an overlay. I can be imagining after running so many tests, but the retry counts with iperf3 is not happening as often, and if pinning iperf3 to 4 or 5, even with ondemand it is more stable and get close to 900Mbit/s. This is added to my rc.local now echo 32768 > /sys/class/net/end0/queues/rx-0/rps_flow_cnt echo 32768 > /proc/sys/net/core/rps_sock_flow_entries for i in `pgrep "nfsd"` ; do ionice -c1 -p $i ; taskset -c -p 4 $i; done 0 Quote
snakekick Posted Wednesday at 06:37 PM Posted Wednesday at 06:37 PM Hello @prahal, sorry if I'm annoying. 25.2 is out, did it work out that you could bring in your fixes? many thanks 0 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.