OK, so I had to reboot soon after the above post due to the server not being responsive. it was still alive with occasional disk activity and the heartbeat LED was happily flashing with the correct cadence. Looks like the r8152 timed out again and could not recover. Not a lot of info to go on, but I don't think it was a busy server at the time.
Are the DMA errors what are normally seen with an overloaded core?
2024-02-05T03:08:32.752569+00:00 helios64 kernel: [40379.321499] xhci-hcd xhci-hcd.0.auto: WARN Event TRB for slot 3 ep 3 with no TDs queued?
2024-02-05T03:08:32.752783+00:00 helios64 kernel: [40379.321554] xhci-hcd xhci-hcd.0.auto: WARN Event TRB for slot 3 ep 3 with no TDs queued?
2024-02-05T03:08:32.765068+00:00 helios64 kernel: [40379.337314] xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1
2024-02-05T03:08:32.765234+00:00 helios64 kernel: [40379.337362] xhci-hcd xhci-hcd.0.auto: Looking for event-dma 00000000f5119f70 trb-start 00000000f5119f50 trb-end 00000000f5119f50 seg-start 00000000f5119000 seg-end 00000000f5119ff0
2024-02-05T03:08:32.765245+00:00 helios64 kernel: [40379.337421] xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1
2024-02-05T03:08:32.765299+00:00 helios64 kernel: [40379.337445] xhci-hcd xhci-hcd.0.auto: Looking for event-dma 00000000f5119f80 trb-start 00000000f5119f50 trb-end 00000000f5119f50 seg-start 00000000f5119000 seg-end 00000000f5119ff0
2024-02-05T03:08:32.765396+00:00 helios64 kernel: [40379.337485] xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1
2024-02-05T03:08:32.765466+00:00 helios64 kernel: [40379.337510] xhci-hcd xhci-hcd.0.auto: Looking for event-dma 00000000f5119f90 trb-start 00000000f5119f50 trb-end 00000000f5119f50 seg-start 00000000f5119000 seg-end 00000000f5119ff0
2024-02-05T03:08:32.765473+00:00 helios64 kernel: [40379.337549] xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1
2024-02-05T03:08:32.765476+00:00 helios64 kernel: [40379.337572] xhci-hcd xhci-hcd.0.auto: Looking for event-dma 00000000f5119fa0 trb-start 00000000f5119f50 trb-end 00000000f5119f50 seg-start 00000000f5119000 seg-end 00000000f5119ff0
2024-02-05T03:08:32.765480+00:00 helios64 kernel: [40379.337606] xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1
2024-02-05T03:08:32.765528+00:00 helios64 kernel: [40379.337629] xhci-hcd xhci-hcd.0.auto: Looking for event-dma 00000000f5119fb0 trb-start 00000000f5119f50 trb-end 00000000f5119f50 seg-start 00000000f5119000 seg-end 00000000f5119ff0
2024-02-05T03:08:32.765536+00:00 helios64 kernel: [40379.337663] xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1
2024-02-05T03:08:32.766028+00:00 helios64 kernel: [40379.337686] xhci-hcd xhci-hcd.0.auto: Looking for event-dma 00000000f5119fc0 trb-start 00000000f5119f50 trb-end 00000000f5119f50 seg-start 00000000f5119000 seg-end 00000000f5119ff0
2024-02-05T03:08:32.766039+00:00 helios64 kernel: [40379.337719] xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1
2024-02-05T03:08:32.766043+00:00 helios64 kernel: [40379.337742] xhci-hcd xhci-hcd.0.auto: Looking for event-dma 00000000f5119fd0 trb-start 00000000f5119f50 trb-end 00000000f5119f50 seg-start 00000000f5119000 seg-end 00000000f5119ff0
2024-02-05T03:08:32.766048+00:00 helios64 kernel: [40379.337781] xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1
2024-02-05T03:08:32.766052+00:00 helios64 kernel: [40379.337805] xhci-hcd xhci-hcd.0.auto: Looking for event-dma 00000000f5119fe0 trb-start 00000000f5119f50 trb-end 00000000f5119f50 seg-start 00000000f5119000 seg-end 00000000f5119ff0
2024-02-05T03:08:32.766057+00:00 helios64 kernel: [40379.337839] xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1
2024-02-05T03:08:32.766061+00:00 helios64 kernel: [40379.337862] xhci-hcd xhci-hcd.0.auto: Looking for event-dma 00000000f5119ff0 trb-start 00000000f5119f50 trb-end 00000000f5119f50 seg-start 00000000f5119000 seg-end 00000000f5119ff0
2024-02-05T03:08:32.766065+00:00 helios64 kernel: [40379.337895] xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 3 comp_code 1
2024-02-05T03:08:32.766069+00:00 helios64 kernel: [40379.337918] xhci-hcd xhci-hcd.0.auto: Looking for event-dma 00000000f511a000 trb-start 00000000f5119f50 trb-end 00000000f5119f50 seg-start 00000000f5119000 seg-end 00000000f5119ff0
024-02-05T03:08:40.392810+00:00 helios64 kernel: [40386.965499] ------------[ cut here ]------------
2024-02-05T03:08:40.392875+00:00 helios64 kernel: [40386.965584] NETDEV WATCHDOG: enx646266d00c4f (r8152): transmit queue 0 timed out 7628 ms
2024-02-05T03:08:40.392883+00:00 helios64 kernel: [40386.965775] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x29c/0x2b4
2024-02-05T03:08:40.395293+00:00 helios64 kernel: [40386.965831] Modules linked in: wireguard libchacha20poly1305 poly1305_neon libcurve25519_generic ip6_udp_tunnel udp_tunnel tls xt_MASQUERADE xt_nat xt_tcpudp iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter zram binfmt_misc snd_soc_hdmi_codec snd_soc_rockchip_i2s hantro_vpu rockchip_vdec(C) snd_soc_core v4l2_vp9 snd_compress videobuf2_dma_contig snd_pcm_dmaengine rockchip_rga v4l2_h264 snd_pcm leds_pwm videobuf2_dma_sg v4l2_mem2mem pwm_fan panfrost videobuf2_memops gpio_charger gpu_sched videobuf2_v4l2 videodev drm_shmem_helper videobuf2_common snd_timer snd mc rk_crypto rng_core soundcore gpio_beeper cpufreq_dt sg lm75 ledtrig_netdev nfsd drivetemp auth_rpcgss nfs_acl lockd grace dm_mod sunrpc ip_tables x_tables autofs4 xfs efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid0 multipath linear cdc_ncm cdc_ether usbnet raid1 r8152 md_mod realtek fusb302 tcpm typec dwmac_rk stmmac_platform stmmac pcs_xpcs adc_keys
2024-02-05T03:08:40.396875+00:00 helios64 kernel: [40386.966750] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G C 6.7.2-edge-rockchip64 #3
2024-02-05T03:08:40.396885+00:00 helios64 kernel: [40386.966786] Hardware name: Helios64 (DT)
2024-02-05T03:08:40.396891+00:00 helios64 kernel: [40386.966797] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
2024-02-05T03:08:40.396897+00:00 helios64 kernel: [40386.966820] pc : dev_watchdog+0x29c/0x2b4
2024-02-05T03:08:40.396903+00:00 helios64 kernel: [40386.966852] lr : dev_watchdog+0x29c/0x2b4
2024-02-05T03:08:40.396908+00:00 helios64 kernel: [40386.966879] sp : ffff800080003dc0
2024-02-05T03:08:40.396914+00:00 helios64 kernel: [40386.966893] x29: ffff800080003dc0 x28: ffff800080ec96fc x27: ffff800080003ec0
2024-02-05T03:08:40.396920+00:00 helios64 kernel: [40386.966940] x26: ffff800081849008 x25: 0000000000001dcc x24: ffff800081ba7000
2024-02-05T03:08:40.396927+00:00 helios64 kernel: [40386.966986] x23: 0000000000000000 x22: ffff0000f570b41c x21: ffff0000f570b000
2024-02-05T03:08:40.396933+00:00 helios64 kernel: [40386.967030] x20: ffff0000f568aa00 x19: ffff0000f570b4c8 x18: ffffffffffffffff
2024-02-05T03:08:40.396938+00:00 helios64 kernel: [40386.967076] x17: 64656d6974203020 x16: 6575657571207469 x15: 6d736e617274203a
2024-02-05T03:08:40.396944+00:00 helios64 kernel: [40386.967121] x14: 2932353138722820 x13: 0000000000000377 x12: 00000000ffffffea
2024-02-05T03:08:40.397021+00:00 helios64 kernel: [40386.967166] x11: 00000000ffffefff x10: 00000000ffffefff x9 : ffff800081c26668
2024-02-05T03:08:40.397105+00:00 helios64 kernel: [40386.967211] x8 : 0000000000017fe8 x7 : c0000000ffffefff x6 : 0000000005000000
2024-02-05T03:08:40.397115+00:00 helios64 kernel: [40386.967255] x5 : 0000000000000001 x4 : 0000000000000040 x3 : 0000000000000001
2024-02-05T03:08:40.397121+00:00 helios64 kernel: [40386.967298] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff800081bb4500
2024-02-05T03:08:40.397126+00:00 helios64 kernel: [40386.967344] Call trace:
2024-02-05T03:08:40.397131+00:00 helios64 kernel: [40386.967356] dev_watchdog+0x29c/0x2b4
2024-02-05T03:08:40.397149+00:00 helios64 kernel: [40386.967387] call_timer_fn+0x34/0x1c0
2024-02-05T03:08:40.397157+00:00 helios64 kernel: [40386.967420] __run_timers.part.0+0x228/0x2f4
2024-02-05T03:08:40.397163+00:00 helios64 kernel: [40386.967451] run_timer_softirq+0x48/0x84
2024-02-05T03:08:40.397168+00:00 helios64 kernel: [40386.967481] __do_softirq+0x150/0x3e4
2024-02-05T03:08:40.397173+00:00 helios64 kernel: [40386.967505] ____do_softirq+0x10/0x1c
2024-02-05T03:08:40.397178+00:00 helios64 kernel: [40386.967533] call_on_irq_stack+0x24/0x4c
2024-02-05T03:08:40.397183+00:00 helios64 kernel: [40386.967558] do_softirq_own_stack+0x1c/0x2c
2024-02-05T03:08:40.397188+00:00 helios64 kernel: [40386.967584] irq_exit_rcu+0x9c/0xcc
2024-02-05T03:08:40.397193+00:00 helios64 kernel: [40386.967616] el1_interrupt+0x38/0x68
2024-02-05T03:08:40.397198+00:00 helios64 kernel: [40386.967644] el1h_64_irq_handler+0x18/0x24
2024-02-05T03:08:40.397265+00:00 helios64 kernel: [40386.967671] el1h_64_irq+0x64/0x68
2024-02-05T03:08:40.397277+00:00 helios64 kernel: [40386.967693] cpuidle_enter_state+0xc4/0x4bc
2024-02-05T03:08:40.397283+00:00 helios64 kernel: [40386.967722] cpuidle_enter+0x38/0x50
2024-02-05T03:08:40.397287+00:00 helios64 kernel: [40386.967749] do_idle+0x1fc/0x270
2024-02-05T03:08:40.397292+00:00 helios64 kernel: [40386.967779] cpu_startup_entry+0x34/0x3c
2024-02-05T03:08:40.397297+00:00 helios64 kernel: [40386.967807] kernel_init+0x0/0x1e0
2024-02-05T03:08:40.397302+00:00 helios64 kernel: [40386.967835] arch_post_acpi_subsys_init+0x0/0x8
2024-02-05T03:08:40.397360+00:00 helios64 kernel: [40386.967865] start_kernel+0x6c4/0x8fc
2024-02-05T03:08:40.397372+00:00 helios64 kernel: [40386.967891] __primary_switched+0xb4/0xbc
2024-02-05T03:08:40.397377+00:00 helios64 kernel: [40386.967925] ---[ end trace 0000000000000000 ]---
2024-02-05T03:08:40.397383+00:00 helios64 kernel: [40386.967975] r8152 2-1.4:1.0 enx646266d00c4f: Tx timeout
2024-02-05T03:08:45.516397+00:00 helios64 kernel: [40392.085520] xhci-hcd xhci-hcd.0.auto: xHCI host not responding to stop endpoint command
2024-02-05T03:08:45.516458+00:00 helios64 kernel: [40392.085569] xhci-hcd xhci-hcd.0.auto: xHCI host controller not responding, assume dead
2024-02-05T03:08:45.516467+00:00 helios64 kernel: [40392.085685] xhci-hcd xhci-hcd.0.auto: HC died; cleaning up
2024-02-05T03:08:45.516472+00:00 helios64 kernel: [40392.085776] usb 1-1: USB disconnect, device number 2
2024-02-05T03:08:45.516477+00:00 helios64 kernel: [40392.087845] usb 2-1: USB disconnect, device number 2
2024-02-05T03:08:45.516482+00:00 helios64 kernel: [40392.087878] r8152-cfgselector 2-1.4: USB disconnect, device number 3
2024-02-05T03:08:45.516487+00:00 helios64 kernel: [40392.088121] r8152 2-1.4:1.0 enx646266d00c4f: Tx status -108
2024-02-05T03:08:45.516494+00:00 helios64 kernel: [40392.088156] r8152 2-1.4:1.0 enx646266d00c4f: Tx status -108
2024-02-05T03:08:45.516501+00:00 helios64 kernel: [40392.088182] r8152 2-1.4:1.0 enx646266d00c4f: Tx status -2
2024-02-05T03:08:45.516508+00:00 helios64 kernel: [40392.088229] r8152 2-1.4:1.0 enx646266d00c4f: Tx status -108
2024-02-05T03:08:45.516513+00:00 helios64 kernel: [40392.088416] r8152 2-1.4:1.0 enx646266d00c4f: Get ether addr fail