Jump to content

Helios 64: Network Issues with Armbian 24.5.3 on Kernel 6.6.39 - Frequent Disconnects


Go to solution Solved by ebin-dev,

Recommended Posts

Posted (edited)

Hello everyone,

 

I'm experiencing frequent network disconnections on my system running Armbian 24.5.3, both with Bookworm and Noble bases, which are both using Kernel 6.6.39. This issue becomes particularly frustrating when syncing files with Unison, as the interruptions disrupt the process frequently.

 

To provide some background, I didn't encounter these network issues when I was using the last Focal version, which was running on Kernel 5.x. The current versions of Samba I'm using are 4.17.12-Debian on Bookworm and 4.19.5-Ubuntu on Noble. Unison is at version 2.52.1 on Bookworm and 2.53.3 on Noble. Both systems also have the latest updates for MDADM and OpenSSH from the standard package repositories.

 

In the logs (journalctl -xe), I've noticed the following messages repeatedly:

kernel: r8152 2-1.4:1.0 enx646266d00567: Tx timeout
kernel: r8152 2-1.4:1.0 enx646266d00567: Tx status -2


These errors are accompanied by the following, occurring approximately 15 seconds earlier:

kernel: xhci-hcd xhci-hcd.0.auto: WARN Event TRB for slot 3 ep 3 with no TDs queued?
kernel: xhci-hcd xhci-hcd.0.auto: WARN Event TRB for slot 3 ep 3 with no TDs queued?
kernel: xhci-hcd xhci-hcd.0.auto: WARN Event TRB for slot 3 ep 3 with no TDs queued?
kernel: xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1
kernel: xhci-hcd xhci-hcd.0.auto: Looking for event-dma 000000000908e050 trb-start 000000000908e020 trb-end 000000000908e020 seg-start 000000000908e000 seg-end 000000000908eff0
kernel: xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1
kernel: xhci-hcd xhci-hcd.0.auto: Looking for event-dma 000000000908e060 trb-start 000000000908e020 trb-end 000000000908e020 seg-start 000000000908e000 seg-end 000000000908eff0
kernel: xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1
kernel: xhci-hcd xhci-hcd.0.auto: Looking for event-dma 000000000908e070 trb-start 000000000908e020 trb-end 000000000908e020 seg-start 000000000908e000 seg-end 000000000908eff0
kernel: xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1
kernel: xhci-hcd xhci-hcd.0.auto: Looking for event-dma 000000000908e080 trb-start 000000000908e020 trb-end 000000000908e020 seg-start 000000000908e000 seg-end 000000000908eff0
kernel: xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1
kernel: xhci-hcd xhci-hcd.0.auto: Looking for event-dma 000000000908e090 trb-start 000000000908e020 trb-end 000000000908e020 seg-start 000000000908e000 seg-end 000000000908eff0
kernel: xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1
kernel: xhci-hcd xhci-hcd.0.auto: Looking for event-dma 000000000908e0a0 trb-start 000000000908e020 trb-end 000000000908e020 seg-start 000000000908e000 seg-end 000000000908eff0
kernel: xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1
kernel: xhci-hcd xhci-hcd.0.auto: Looking for event-dma 000000000908e0b0 trb-start 000000000908e020 trb-end 000000000908e020 seg-start 000000000908e000 seg-end 000000000908eff0
kernel: xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1
kernel: xhci-hcd xhci-hcd.0.auto: Looking for event-dma 000000000908e0c0 trb-start 000000000908e020 trb-end 000000000908e020 seg-start 000000000908e000 seg-end 000000000908eff0
kernel: xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1
kernel: xhci-hcd xhci-hcd.0.auto: Looking for event-dma 000000000908e110 trb-start 000000000908e020 trb-end 000000000908e020 seg-start 000000000908e000 seg-end 000000000908eff0
kernel: xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1
kernel: xhci-hcd xhci-hcd.0.auto: Looking for event-dma 000000000908e120 trb-start 000000000908e020 trb-end 000000000908e020 seg-start 000000000908e000 seg-end 000000000908eff0


I've come across advice suggesting the installation of the correct driver, but that hasn't been very helpful so far.

 

Has anyone else faced similar issues with network stability, particularly under heavy load (e.g., large file transfers)? If so, what steps did you take to resolve it? I would greatly appreciate any insights or recommendations on how to diagnose or fix this problem.

Thank you in advance for your assistance!

 

Best regards,

mtlmarko189

 

edit: Please forgive if I should have overlooked the same topic, but unfortunately I have not found what I was looking for on the topic regarding Helios64.

Edited by mtlmarko189
  • SteeMan changed the title to Helios 64: Network Issues with Armbian 24.5.3 on Kernel 6.6.39 - Frequent Disconnects
Posted (edited)
4 hours ago, mtlmarko189 said:

In the logs (journalctl -xe), I've noticed the following messages repeatedly:

kernel: r8152 2-1.4:1.0 enx646266d00567: Tx timeout
kernel: r8152 2-1.4:1.0 enx646266d00567: Tx status -2

 

Did you update the r8152 firmware (link)? 

# dmesg | grep r8152
[    5.245881] usbcore: registered new device driver r8152-cfgselector
[    5.460956] r8152-cfgselector 2-1.4: reset SuperSpeed USB device number 3 using xhci-hcd
[    5.741177] r8152 2-1.4:1.0: load rtl8156a-2 v2 04/27/23 successfully
[    5.795083] r8152 2-1.4:1.0 eth0: v1.12.13

 

Edited by ebin-dev
Posted

Thank you for the quick help. Apparently I overlooked a little, thank you very much!
I look at it!

 

No, I explicitly updated nothing because I assumed that the current driver is already included in the image.

 

I'll try it tomorrow.

Posted

It is the case that I only copy the content of the "rtl_nic" folder from the archive to the folder "/lib/firmware/rtl_nic" to the archive correctly?

 

Unfortunately, that doesn't fix my problem either. I still have the message with the status -2 and the device is no longer available at times.

 

However, I noticed something. I bought a series on AkibapassTV and I put these files on the Helios64. I play this with the VLC player from my PC. If I spoil too quickly, I see in the terminal, where I also let "Ping" run that the answer "Destination Host Unreachable" is output in places, at least until the network adapter has been restarted and it is active again.

 

However, I can no longer find the DMA reports in the list, so it seems to have resolved it.

 

Still, I get something like that:

# dmesg | grep r8152
[   21.167322] usbcore: registered new device driver r8152-cfgselector
[   21.380437] r8152-cfgselector 2-1.4: reset SuperSpeed USB device number 3 using xhci-hcd
[   21.476788] r8152 2-1.4:1.0: load rtl8156a-2 v2 04/27/23 successfully
[   21.531806] r8152 2-1.4:1.0 eth0: v1.12.13
[   21.532016] usbcore: registered new interface driver r8152
[   21.605123] r8152 2-1.4:1.0 enx646266d00567: renamed from eth0
[   25.324976] r8152 2-1.4:1.0 enx646266d00567: Stop submitting intr, status -108
[   25.751987] r8152-cfgselector 2-1.4: reset SuperSpeed USB device number 3 using xhci-hcd
[   25.837596] r8152 2-1.4:1.0: load rtl8156a-2 v2 04/27/23 successfully
[   25.906230] r8152 2-1.4:1.0 eth0: v1.12.13
[   26.114614] r8152 2-1.4:1.0 enx646266d00567: renamed from eth0
[   28.391924] r8152 2-1.4:1.0 enx646266d00567: carrier on
[24871.270580] r8152 2-1.4:1.0 enx646266d00567: carrier on
[42214.756880] NETDEV WATCHDOG: enx646266d00567 (r8152): transmit queue 0 timed out 7312 ms
[42214.756940] Modules linked in: lz4hc lz4 zram binfmt_misc leds_pwm hantro_vpu rockchip_vdec(C) v4l2_vp9 pwm_fan rockchip_rga videobuf2_dma_contig snd_soc_hdmi_codec v4l2_h264 videobuf2_dma_sg v4l2_mem2mem snd_soc_rockchip_i2s videobuf2_memops videobuf2_v4l2 snd_soc_core rk_crypto panfrost snd_compress gpu_sched snd_pcm_dmaengine rng_core drm_shmem_helper snd_pcm snd_timer videodev snd videobuf2_common mc soundcore cpufreq_dt gpio_beeper sg cfg80211 rfkill ledtrig_netdev lm75 dm_mod nfnetlink ip_tables x_tables autofs4 raid10 raid1 raid0 multipath linear cdc_ncm cdc_ether usbnet raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx md_mod r8152 realtek dwmac_rk fusb302 gpio_charger stmmac_platform tcpm typec stmmac pcs_xpcs adc_keys
[42214.757477] r8152 2-1.4:1.0 enx646266d00567: Tx timeout
[42214.761162] r8152 2-1.4:1.0 enx646266d00567: Tx status -2
[42214.761240] r8152 2-1.4:1.0 enx646266d00567: Tx status -2
[42214.761392] r8152 2-1.4:1.0 enx646266d00567: Tx status -2
[42214.761533] r8152 2-1.4:1.0 enx646266d00567: Tx status -2
[42216.867928] r8152-cfgselector 2-1.4: reset SuperSpeed USB device number 3 using xhci-hcd
[75856.736623] r8152 2-1.4:1.0 enx646266d00567: Tx timeout
[75856.741073] r8152 2-1.4:1.0 enx646266d00567: Tx status -2
[75856.741130] r8152 2-1.4:1.0 enx646266d00567: Tx status -2
[75856.741241] r8152 2-1.4:1.0 enx646266d00567: Tx status -2
[75856.741387] r8152 2-1.4:1.0 enx646266d00567: Tx status -2
[75858.851803] r8152-cfgselector 2-1.4: reset SuperSpeed USB device number 3 using xhci-hcd


Do you have a few more ideas? Or is this 1 folder "rtl_nic" from the archive not enough?

  • Solution
Posted (edited)

These timeout issues are actually linux kernel related (see i.e. here).

To get rid of those issues, disable scatter/gather offloading (just execute 'ethtool -K <iface> sg off' ; replace <iface> by your interface name like eth0). But transmission speed will suffer !

 

With some optimizations the timeout issues also disappear. Dealing with 2.5G traffic is quite a burden on the CPUs! Use jumbo frames.

 

I am using the following content in /etc/rc.local to bind the tasks involved to the big cores (sata (cpu 4) and 2.5g network traffic (cpu 5)).

 

How is your PC connected to your Helios64 ? If you are using an 2.5G USB-C adapter it should contain the latest version of the rtl8156 chip (31.04) (i.e. like this one).

 

#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "exit 0" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution
# bits.
#
# By default this script does nothing.

cd /sys/devices/system/cpu/cpufreq

for cpufreqpolicy in 0 4 ; do
    echo 1 > policy${cpufreqpolicy}/ondemand/io_is_busy
    echo 25 > policy${cpufreqpolicy}/ondemand/up_threshold
    echo 10 > policy${cpufreqpolicy}/ondemand/sampling_down_factor
    echo $(cat policy${cpufreqpolicy}/cpuinfo_transition_latency) > policy${cpufreqpolicy}/ondemand/sampling_rate
done

for i in $(awk -F":" "/ahci/ {print \$1}" < /proc/interrupts | sed 's/\ //g'); do
        echo 10 > /proc/irq/$i/smp_affinity
done
for i in $(awk -F":" "/xhci/ {print \$1}" < /proc/interrupts | sed 's/\ //g'); do
        echo 20 > /proc/irq/$i/smp_affinity
done

exit 0

 

rk3399-kobol-helios64.dtb-6.10.2-L2-hs400-opp rk3399-kobol-helios64.dtb-6.6.34-L2-hs400-opp

Edited by ebin-dev
Posted

I have now deactivated scatter/gather offloading and I have also taken over the submission of the rc.local and restarted the device to see that the content of the rc.local was also executed correctly.

 

I also tried to provoke the utilization by jumping back and forth in one of the videos, and the network connection had remained active all the time.

 

So far it looks good.

 

To the 2 additional DTB attached, are there important changes in it?

 

To the other question:
Why should I connect such an adapter? The Helios64 has a 1 GHz and a 2.5 GHz LAN in the back. With the 2.5 GHz connection, the NAS hangs on the router, of course also on the 2.5 GHz port. My PC, with which I access the NAS via OpenSSH, hangs on a 1 GHz LAN on the same router with a 1 GHz LAN. Apart from the microSD on which the Armbian image is and the hard drives that form the raid, I have not built any additional modules to the Helios64.

 

I watch the situation for a few more days, but if it stays that way, I can probably mark your last note as a solution and add my 1st post with a short summary.

Posted
1 hour ago, mtlmarko189 said:

To the 2 additional DTB attached, are there important changes in it?

 

Well, the dtbs contain fixes for the 6.6 and 6.10 branch respectively (see i.e. here).

 

1 hour ago, mtlmarko189 said:

To the other question:
Why should I connect such an adapter?

 

If the 2.5G port of your Helios64 is connected to a device (via a 2.5G switch), where that device is connected using a 2.5G USB-C adapter instead of a standard 1G ethernet plug, the transmission rates between the two devices rise to 2.5Gbit/s full duplex.

Posted

I guess because it is a kernel-issue that this topic is not really solvable. But with your proposal, the dropouts have definitely become considerably less and will probably be the best thing you can get out of the current situation. I now set your proposal as a "solution".

 

Thank you for your help, I can now work with it again.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines