mtlmarko189 Posted August 16 Posted August 16 (edited) Hello everyone, I'm experiencing frequent network disconnections on my system running Armbian 24.5.3, both with Bookworm and Noble bases, which are both using Kernel 6.6.39. This issue becomes particularly frustrating when syncing files with Unison, as the interruptions disrupt the process frequently. To provide some background, I didn't encounter these network issues when I was using the last Focal version, which was running on Kernel 5.x. The current versions of Samba I'm using are 4.17.12-Debian on Bookworm and 4.19.5-Ubuntu on Noble. Unison is at version 2.52.1 on Bookworm and 2.53.3 on Noble. Both systems also have the latest updates for MDADM and OpenSSH from the standard package repositories. In the logs (journalctl -xe), I've noticed the following messages repeatedly: kernel: r8152 2-1.4:1.0 enx646266d00567: Tx timeout kernel: r8152 2-1.4:1.0 enx646266d00567: Tx status -2 These errors are accompanied by the following, occurring approximately 15 seconds earlier: kernel: xhci-hcd xhci-hcd.0.auto: WARN Event TRB for slot 3 ep 3 with no TDs queued? kernel: xhci-hcd xhci-hcd.0.auto: WARN Event TRB for slot 3 ep 3 with no TDs queued? kernel: xhci-hcd xhci-hcd.0.auto: WARN Event TRB for slot 3 ep 3 with no TDs queued? kernel: xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1 kernel: xhci-hcd xhci-hcd.0.auto: Looking for event-dma 000000000908e050 trb-start 000000000908e020 trb-end 000000000908e020 seg-start 000000000908e000 seg-end 000000000908eff0 kernel: xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1 kernel: xhci-hcd xhci-hcd.0.auto: Looking for event-dma 000000000908e060 trb-start 000000000908e020 trb-end 000000000908e020 seg-start 000000000908e000 seg-end 000000000908eff0 kernel: xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1 kernel: xhci-hcd xhci-hcd.0.auto: Looking for event-dma 000000000908e070 trb-start 000000000908e020 trb-end 000000000908e020 seg-start 000000000908e000 seg-end 000000000908eff0 kernel: xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1 kernel: xhci-hcd xhci-hcd.0.auto: Looking for event-dma 000000000908e080 trb-start 000000000908e020 trb-end 000000000908e020 seg-start 000000000908e000 seg-end 000000000908eff0 kernel: xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1 kernel: xhci-hcd xhci-hcd.0.auto: Looking for event-dma 000000000908e090 trb-start 000000000908e020 trb-end 000000000908e020 seg-start 000000000908e000 seg-end 000000000908eff0 kernel: xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1 kernel: xhci-hcd xhci-hcd.0.auto: Looking for event-dma 000000000908e0a0 trb-start 000000000908e020 trb-end 000000000908e020 seg-start 000000000908e000 seg-end 000000000908eff0 kernel: xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1 kernel: xhci-hcd xhci-hcd.0.auto: Looking for event-dma 000000000908e0b0 trb-start 000000000908e020 trb-end 000000000908e020 seg-start 000000000908e000 seg-end 000000000908eff0 kernel: xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1 kernel: xhci-hcd xhci-hcd.0.auto: Looking for event-dma 000000000908e0c0 trb-start 000000000908e020 trb-end 000000000908e020 seg-start 000000000908e000 seg-end 000000000908eff0 kernel: xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1 kernel: xhci-hcd xhci-hcd.0.auto: Looking for event-dma 000000000908e110 trb-start 000000000908e020 trb-end 000000000908e020 seg-start 000000000908e000 seg-end 000000000908eff0 kernel: xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1 kernel: xhci-hcd xhci-hcd.0.auto: Looking for event-dma 000000000908e120 trb-start 000000000908e020 trb-end 000000000908e020 seg-start 000000000908e000 seg-end 000000000908eff0 I've come across advice suggesting the installation of the correct driver, but that hasn't been very helpful so far. Has anyone else faced similar issues with network stability, particularly under heavy load (e.g., large file transfers)? If so, what steps did you take to resolve it? I would greatly appreciate any insights or recommendations on how to diagnose or fix this problem. Thank you in advance for your assistance! Best regards, mtlmarko189 edit: Please forgive if I should have overlooked the same topic, but unfortunately I have not found what I was looking for on the topic regarding Helios64. Edited August 16 by mtlmarko189 0 Quote
ebin-dev Posted August 16 Posted August 16 (edited) 4 hours ago, mtlmarko189 said: In the logs (journalctl -xe), I've noticed the following messages repeatedly: kernel: r8152 2-1.4:1.0 enx646266d00567: Tx timeout kernel: r8152 2-1.4:1.0 enx646266d00567: Tx status -2 Did you update the r8152 firmware (link)? # dmesg | grep r8152 [ 5.245881] usbcore: registered new device driver r8152-cfgselector [ 5.460956] r8152-cfgselector 2-1.4: reset SuperSpeed USB device number 3 using xhci-hcd [ 5.741177] r8152 2-1.4:1.0: load rtl8156a-2 v2 04/27/23 successfully [ 5.795083] r8152 2-1.4:1.0 eth0: v1.12.13 Edited August 16 by ebin-dev 0 Quote
mtlmarko189 Posted August 16 Author Posted August 16 Thank you for the quick help. Apparently I overlooked a little, thank you very much! I look at it! No, I explicitly updated nothing because I assumed that the current driver is already included in the image. I'll try it tomorrow. 0 Quote
mtlmarko189 Posted August 20 Author Posted August 20 It is the case that I only copy the content of the "rtl_nic" folder from the archive to the folder "/lib/firmware/rtl_nic" to the archive correctly? Unfortunately, that doesn't fix my problem either. I still have the message with the status -2 and the device is no longer available at times. However, I noticed something. I bought a series on AkibapassTV and I put these files on the Helios64. I play this with the VLC player from my PC. If I spoil too quickly, I see in the terminal, where I also let "Ping" run that the answer "Destination Host Unreachable" is output in places, at least until the network adapter has been restarted and it is active again. However, I can no longer find the DMA reports in the list, so it seems to have resolved it. Still, I get something like that: # dmesg | grep r8152 [ 21.167322] usbcore: registered new device driver r8152-cfgselector [ 21.380437] r8152-cfgselector 2-1.4: reset SuperSpeed USB device number 3 using xhci-hcd [ 21.476788] r8152 2-1.4:1.0: load rtl8156a-2 v2 04/27/23 successfully [ 21.531806] r8152 2-1.4:1.0 eth0: v1.12.13 [ 21.532016] usbcore: registered new interface driver r8152 [ 21.605123] r8152 2-1.4:1.0 enx646266d00567: renamed from eth0 [ 25.324976] r8152 2-1.4:1.0 enx646266d00567: Stop submitting intr, status -108 [ 25.751987] r8152-cfgselector 2-1.4: reset SuperSpeed USB device number 3 using xhci-hcd [ 25.837596] r8152 2-1.4:1.0: load rtl8156a-2 v2 04/27/23 successfully [ 25.906230] r8152 2-1.4:1.0 eth0: v1.12.13 [ 26.114614] r8152 2-1.4:1.0 enx646266d00567: renamed from eth0 [ 28.391924] r8152 2-1.4:1.0 enx646266d00567: carrier on [24871.270580] r8152 2-1.4:1.0 enx646266d00567: carrier on [42214.756880] NETDEV WATCHDOG: enx646266d00567 (r8152): transmit queue 0 timed out 7312 ms [42214.756940] Modules linked in: lz4hc lz4 zram binfmt_misc leds_pwm hantro_vpu rockchip_vdec(C) v4l2_vp9 pwm_fan rockchip_rga videobuf2_dma_contig snd_soc_hdmi_codec v4l2_h264 videobuf2_dma_sg v4l2_mem2mem snd_soc_rockchip_i2s videobuf2_memops videobuf2_v4l2 snd_soc_core rk_crypto panfrost snd_compress gpu_sched snd_pcm_dmaengine rng_core drm_shmem_helper snd_pcm snd_timer videodev snd videobuf2_common mc soundcore cpufreq_dt gpio_beeper sg cfg80211 rfkill ledtrig_netdev lm75 dm_mod nfnetlink ip_tables x_tables autofs4 raid10 raid1 raid0 multipath linear cdc_ncm cdc_ether usbnet raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx md_mod r8152 realtek dwmac_rk fusb302 gpio_charger stmmac_platform tcpm typec stmmac pcs_xpcs adc_keys [42214.757477] r8152 2-1.4:1.0 enx646266d00567: Tx timeout [42214.761162] r8152 2-1.4:1.0 enx646266d00567: Tx status -2 [42214.761240] r8152 2-1.4:1.0 enx646266d00567: Tx status -2 [42214.761392] r8152 2-1.4:1.0 enx646266d00567: Tx status -2 [42214.761533] r8152 2-1.4:1.0 enx646266d00567: Tx status -2 [42216.867928] r8152-cfgselector 2-1.4: reset SuperSpeed USB device number 3 using xhci-hcd [75856.736623] r8152 2-1.4:1.0 enx646266d00567: Tx timeout [75856.741073] r8152 2-1.4:1.0 enx646266d00567: Tx status -2 [75856.741130] r8152 2-1.4:1.0 enx646266d00567: Tx status -2 [75856.741241] r8152 2-1.4:1.0 enx646266d00567: Tx status -2 [75856.741387] r8152 2-1.4:1.0 enx646266d00567: Tx status -2 [75858.851803] r8152-cfgselector 2-1.4: reset SuperSpeed USB device number 3 using xhci-hcd Do you have a few more ideas? Or is this 1 folder "rtl_nic" from the archive not enough? 0 Quote
Solution ebin-dev Posted August 21 Solution Posted August 21 (edited) These timeout issues are actually linux kernel related (see i.e. here). To get rid of those issues, disable scatter/gather offloading (just execute 'ethtool -K <iface> sg off' ; replace <iface> by your interface name like eth0). But transmission speed will suffer ! With some optimizations the timeout issues also disappear. Dealing with 2.5G traffic is quite a burden on the CPUs! Use jumbo frames. I am using the following content in /etc/rc.local to bind the tasks involved to the big cores (sata (cpu 4) and 2.5g network traffic (cpu 5)). How is your PC connected to your Helios64 ? If you are using an 2.5G USB-C adapter it should contain the latest version of the rtl8156 chip (31.04) (i.e. like this one). #!/bin/sh -e # # rc.local # # This script is executed at the end of each multiuser runlevel. # Make sure that the script will "exit 0" on success or any other # value on error. # # In order to enable or disable this script just change the execution # bits. # # By default this script does nothing. cd /sys/devices/system/cpu/cpufreq for cpufreqpolicy in 0 4 ; do echo 1 > policy${cpufreqpolicy}/ondemand/io_is_busy echo 25 > policy${cpufreqpolicy}/ondemand/up_threshold echo 10 > policy${cpufreqpolicy}/ondemand/sampling_down_factor echo $(cat policy${cpufreqpolicy}/cpuinfo_transition_latency) > policy${cpufreqpolicy}/ondemand/sampling_rate done for i in $(awk -F":" "/ahci/ {print \$1}" < /proc/interrupts | sed 's/\ //g'); do echo 10 > /proc/irq/$i/smp_affinity done for i in $(awk -F":" "/xhci/ {print \$1}" < /proc/interrupts | sed 's/\ //g'); do echo 20 > /proc/irq/$i/smp_affinity done exit 0 rk3399-kobol-helios64.dtb-6.10.2-L2-hs400-opp rk3399-kobol-helios64.dtb-6.6.34-L2-hs400-opp Edited August 21 by ebin-dev 0 Quote
mtlmarko189 Posted August 22 Author Posted August 22 I have now deactivated scatter/gather offloading and I have also taken over the submission of the rc.local and restarted the device to see that the content of the rc.local was also executed correctly. I also tried to provoke the utilization by jumping back and forth in one of the videos, and the network connection had remained active all the time. So far it looks good. To the 2 additional DTB attached, are there important changes in it? To the other question: Why should I connect such an adapter? The Helios64 has a 1 GHz and a 2.5 GHz LAN in the back. With the 2.5 GHz connection, the NAS hangs on the router, of course also on the 2.5 GHz port. My PC, with which I access the NAS via OpenSSH, hangs on a 1 GHz LAN on the same router with a 1 GHz LAN. Apart from the microSD on which the Armbian image is and the hard drives that form the raid, I have not built any additional modules to the Helios64. I watch the situation for a few more days, but if it stays that way, I can probably mark your last note as a solution and add my 1st post with a short summary. 0 Quote
ebin-dev Posted August 22 Posted August 22 1 hour ago, mtlmarko189 said: To the 2 additional DTB attached, are there important changes in it? Well, the dtbs contain fixes for the 6.6 and 6.10 branch respectively (see i.e. here). 1 hour ago, mtlmarko189 said: To the other question: Why should I connect such an adapter? If the 2.5G port of your Helios64 is connected to a device (via a 2.5G switch), where that device is connected using a 2.5G USB-C adapter instead of a standard 1G ethernet plug, the transmission rates between the two devices rise to 2.5Gbit/s full duplex. 0 Quote
mtlmarko189 Posted August 23 Author Posted August 23 I guess because it is a kernel-issue that this topic is not really solvable. But with your proposal, the dropouts have definitely become considerably less and will probably be the best thing you can get out of the current situation. I now set your proposal as a "solution". Thank you for your help, I can now work with it again. 0 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.