Jump to content

eth1 (2.5) vanished: Upgrade to 20.11.3 / 5.9.14-rockchip64


Recommended Posts

Posted

Well I did a fresh install of Armbian_20.11.6_Helios64_buster_current_5.9.14.img (didn't try to update) and hammered the disks and 2.5g network for about a day now and no tx resets or timeouts happened at all. No disconnects or hang ups.

 

 

Posted

On the off chance something there's something different from that image that @clostro mentioned to mine, I tried the same with a separate SD card. No change: eth1 with 2.5Gbps broke down within 10 seconds of the Helios64 sending my desktop machine data via iperf3, just as described before. The only thing I changed after setting the root passwd was setting up a static IP address for eth1 via nmcli. So back to 1Gbps it is.

Posted

I guess your tx offload is off as well since it is set that way by default but could you try again with 'ethtool -K eth1 tx off'? Asking just in case. Got the tx resets and hang ups again when I switched it back on and initiated a large transfer to see what happens with the new build.

Posted

I've had similar problems with the 2.5GB NIC, even with no load.

Pinging from my PC to the NAS shows gaps during which the NIC is unreachable.

Even this morning, with the very latest armbian image (Armbian_20.11.6_Helios64_buster_current_5.9.14.img), right after the first boot, nothing installed yet, I couldn't even get "apt update" to complete.

Four times in a row, apt complained it couldn't download some lists (never twice the same) or couldn't locate a server (my Pi-Hole logs show every DNS request was fulfilled).

I switched back tp the 1GB NIC and everything works like a charm.

It was the same with the two previous images. Ping reports from my PC consistently show between 8% and 20% dropped packets when using the 2.5GB NIC.

Posted

Yes, it's off. "ethtool -K eth1 tx off" produced no output and it gave up just a few seconds after the transfer started. Forcing tx offload on and off again (should we really trust the driver to report the current setting?) didn't improve things. Yesterday I fiddled around with disabling autosuspend on the usb ports and devices in /sys in case it's some weird issue like that - no improvement.

 

However, I've found that changing the advertised speeds for autonegotiation on my desktop's side (removing the bit for 2.5Gbps, for example) causes the link to go down for a few seconds and then back up - that's picked up by the Helios64's eth1 and it goes back to a stable 1Gbps connection. So at least eth1 can be resurrected without a full NAS reboot in a few seconds' time.

Posted

Yeah, I'm back to the 1gbps port too... Pushed over 3 TB of data on to the device from multiple sources simultaneously the other day and haven't had a single error.

With the previous stable builds it used to throw errors quite often during transfers, so when I didn't see any errors with the new one I figured the problem must have been fixed.

Today I'm loosing eth1 completely even without exerting much of a load. Maybe its because I'm pulling data from the device today instead of pushing. They are tx errors after all and not rx.

 

 

Posted

@gprovost Hi, my Helios64 is connected to a Zyxel XGS1010-12 switch on its 2.5gbps ports via a CAT5E 26AWG metal jacket cable. On the pc side I have the sabrent 2.5g usb network adapter, again connected to the other 2.5gbps port. I think sabrent usb adapter has the same controller as the Helios64. 

 

Spoiler

PS C:> Get-NetAdapter

Name                      InterfaceDescription                    ifIndex Status       MacAddress             LinkSpeed
----                      --------------------                    ------- ------       ----------             ---------
Ethernet 2                Realtek USB 2.5GbE Family Controller         16 Up           00-24-27-88-0B-5A       2.5 Gbps
ZeroTier One [93afae59... ZeroTier Virtual Port                        15 Up           2A-AE-7A-A6-B8-55       100 Mbps
ZeroTier One [af78bf94... ZeroTier Virtual Port #2                     53 Up           62-D9-06-F3-75-44       100 Mbps
VirtualBox Host-Only N... VirtualBox Host-Only Ethernet Adapter        12 Up           0A-00-27-00-00-0C         1 Gbps
Wi-Fi                     Intel(R) Wi-Fi 6 AX200 160MHz                11 Disconnected 50-E0-85-F5-02-3A          0 bps
Ethernet                  Intel(R) I211 Gigabit Network Connec...       4 Disconnected B4-2E-99-8E-1D-2E          0 bps


PS C:> Get-NetAdapterAdvancedProperty "Ethernet 2"

Name                      DisplayName                    DisplayValue                   RegistryKeyword RegistryValue
----                      -----------                    ------------                   --------------- -------------
Ethernet 2                Energy-Efficient Ethernet      Disabled                       *EEE            {0}
Ethernet 2                Flow Control                   Rx & Tx Enabled                *FlowControl    {3}
Ethernet 2                IPv4 Checksum Offload          Rx & Tx Enabled                *IPChecksumO... {3}
Ethernet 2                Jumbo Frame                    Disabled                       *JumboPacket    {1514}
Ethernet 2                Large Send Offload v2 (IPv4)   Enabled                        *LsoV2IPv4      {1}
Ethernet 2                Large Send Offload v2 (IPv6)   Enabled                        *LsoV2IPv6      {1}
Ethernet 2                ARP Offload                    Enabled                        *PMARPOffload   {1}
Ethernet 2                NS Offload                     Enabled                        *PMNSOffload    {1}
Ethernet 2                Priority & VLAN                Priority & VLAN Enabled        *PriorityVLA... {3}
Ethernet 2                Selective suspend              Enabled                        *SelectiveSu... {1}
Ethernet 2                Speed & Duplex                 Auto Negotiation               *SpeedDuplex    {0}
Ethernet 2                SS idle timeout                5                              *SSIdleTimeout  {5}
Ethernet 2                SS idle timeout(Screen off)    3                              *SSIdleTimeo... {3}
Ethernet 2                TCP Checksum Offload (IPv4)    Rx & Tx Enabled                *TCPChecksum... {3}
Ethernet 2                TCP Checksum Offload (IPv6)    Rx & Tx Enabled                *TCPChecksum... {3}
Ethernet 2                UDP Checksum Offload (IPv4)    Rx & Tx Enabled                *UDPChecksum... {3}
Ethernet 2                UDP Checksum Offload (IPv6)    Rx & Tx Enabled                *UDPChecksum... {3}
Ethernet 2                Wake on Magic Packet           Enabled                        *WakeOnMagic... {1}
Ethernet 2                Wake on pattern match          Enabled                        *WakeOnPattern  {1}
Ethernet 2                Advanced EEE                   Disabled                       AdvancedEEE     {0}
Ethernet 2                Green Ethernet                 Enabled                        EnableGreenE... {1}
Ethernet 2                Idle Power Saving              Enabled                        EnableU2P3      {1}
Ethernet 2                Gigabit Lite                   Enabled                        GigaLite        {1}
Ethernet 2                Network Address                                               NetworkAddress  {--}
Ethernet 2                VLAN ID                        0                              VlanID          {0}
Ethernet 2                Wake on link change            Enabled                        WakeOnLinkCh... {1}
Ethernet 2                WOL & Shutdown Link Speed      10 Mbps First                  WolShutdownL... {0}

 

 

Posted

@gprovost

My network setup regarding the Helios64:

1: Helios64 eth0 <--> noname  noname Gigabit switch <--> my internal network, router etc <--> my desktop machine, onboard 1Gbps ethernet port (enp7s0), Intel i211, kernel module igb. Used for ssh connection to trigger iperf3 to go through eth1. MTU 1500.

2: Helios64 eth1 <--> 5m Cat 6 ethernet cable <--> my desktop machine, 2.5Gbps ethernet port (enp6s0), Realtek RTL8125, vanilla kernel module r8169. Nothing but static IP address assignments from the same subnet, 192.168.64.0/24 (different subnet compared to 1). MTU 1500.

 

My desktop PC has both its ethernet ports on the mainboard itself, no USB devices are involved on that side. The mainboard is an ASUS "ROG STRIX X570-E GAMING" with the latest non-beta BIOS as of 2020-01-11.

 

To toggle the 2.5Gbps speed between H64 eth1 and my desktop's enp6s0, I run the following commands on my desktop:

ethtool -s enp6s0 advertise 0x80000000002f # 2.5Gbps
ethtool -s enp6s0 advertise 0x3f # 1Gbps
 

Increasing the MTUs for the direct 2.5Gbps connections to 9124 significantly increases the effective H64 -> desktop througput from ~1.5Gbps to ~2Gbps, but it also "kills" the H64's eth1 much quicker, I feel. Then again, because the stability seems completely random to begin with, that may just be my imagination.

Posted

We are working on this issue.
The same Realtek driver working fine on an Ubuntu laptop (amd64), Helios4 (armhf), and Helios64 LK 4.4.

We are looking into USB host controller driver, it seems there are some quirks not implemented on mainline kernel.

Posted

According to your latest blog post:

Quote

we managed to fix a recurrent issue with the USB driver that impacted the stability of the 2.5GbE interface on Linux Kernel 5.x version.

That's awesome to hear, thank you very much for addressing this issue! Is there any way to test this change upfront? I'd be eager to give this a spin with a self-built kernel/image if necessary, but couldn't find anything concrete in the public git repositories so far.

Posted

Nvm, I found PR 2567. With those changes applied, I can cheerfully report 2.35Gbps net TCP throughput via iperf3 in either direction with an MTU of 1500 and tx offloading enabled. Again, thanks so much!

 

One way remains to reproducibly kill the H64's eth1: set the MTU to 9124 on both ends and run iperf in both directions - each with their own server and the other acting as the client. But as the throughput difference is negligible (2.35Gbps vs 2.39Gbps, both really close to the 2.5Gbps maximum), I can safely exclude that use case for myself. I can't recall whether that even worked on kernel 4.4 or whether I even tested it back then.

 

For whatever it's worth, here's the 5.9 log output from when said crash happens:

Jan 21 22:51:55 helios64 kernel: r8152 4-1.4:1.0 eth1: Tx status -2
Jan 21 22:51:55 helios64 kernel: r8152 4-1.4:1.0 eth1: Tx status -2
Jan 21 22:51:55 helios64 kernel: r8152 4-1.4:1.0 eth1: Tx status -2
Jan 21 22:51:55 helios64 kernel: r8152 4-1.4:1.0 eth1: Tx status -2

 

Posted

This issue reoccurred after upgrading from  Armbian 21.02.3 to 21.05.1 with today's updated packages :(

On 21.02.3 with kernel 5.10.21-rockchip64, everything worked flawlessly for months.

On Armbian 21.05.1 Buster with kernel 5.10.35-rockchip64, whenever I try and stress the 2.5gbps port, it crashes immediately and results in the following dmesg output:

May 11 08:52:14 helios64 kernel: ------------[ cut here ]------------
May 11 08:52:14 helios64 kernel: NETDEV WATCHDOG: eth1 (r8152): transmit queue 0 timed out
May 11 08:52:14 helios64 kernel: WARNING: CPU: 4 PID: 0 at net/sched/sch_generic.c:443 dev_watchdog+0x398/0x3a0
May 11 08:52:14 helios64 kernel: Modules linked in: rfkill governor_performance r8152 snd_soc_hdmi_codec snd_soc_rockchip_i2s rockchipdrm snd_soc_core hantro_vpu(C) rockchip_vdec(C) dw_mipi_dsi snd_pcm_dmaengine rockchip_rga dw_hdmi v4l2_h264 snd_pcm analogix_dp videobuf2_dma_contig panfrost v4l2_mem2mem videobuf2_dma_sg videobuf2_vmalloc snd_timer drm_kms_helper videobuf2_memops snd gpu_sched videobuf2_v4l2 pwm_fan cec leds_pwm gpio_charger sg videobuf2_common soundcore rc_core fusb302 videodev drm tcpm mc drm_panel_orientation_quirks typec gpio_beeper cpufreq_dt zram lm75 ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid0 multipath linear raid1 md_mod realtek dwmac_rk stmmac_platform stmmac pcs_xpcs adc_keys
May 11 08:52:14 helios64 kernel: CPU: 4 PID: 0 Comm: swapper/4 Tainted: G         C        5.10.35-rockchip64 #21.05.1
May 11 08:52:14 helios64 kernel: Hardware name: Helios64 (DT)
May 11 08:52:14 helios64 kernel: pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
May 11 08:52:14 helios64 kernel: pc : dev_watchdog+0x398/0x3a0
May 11 08:52:14 helios64 kernel: lr : dev_watchdog+0x398/0x3a0
May 11 08:52:14 helios64 kernel: sp : ffff800011c0bd50
May 11 08:52:14 helios64 kernel: x29: ffff800011c0bd50 x28: ffff000004a30480 
May 11 08:52:14 helios64 kernel: x27: 0000000000000004 x26: 0000000000000140 
May 11 08:52:14 helios64 kernel: x25: 00000000ffffffff x24: 0000000000000004 
May 11 08:52:14 helios64 kernel: x23: ffff8000118b7000 x22: ffff000004eef41c 
May 11 08:52:14 helios64 kernel: x21: ffff000004eef000 x20: ffff000004eef4c0 
May 11 08:52:14 helios64 kernel: x19: 0000000000000000 x18: ffff8000118ded48 
May 11 08:52:14 helios64 kernel: x17: 0000000000000000 x16: 0000000000000000 
May 11 08:52:14 helios64 kernel: x15: 0000000000000277 x14: ffff800011c0ba10 
May 11 08:52:14 helios64 kernel: x13: 00000000ffffffea x12: ffff80001194ed80 
May 11 08:52:14 helios64 kernel: x11: 0000000000000003 x10: ffff800011936d40 
May 11 08:52:14 helios64 kernel: x9 : ffff800011936d98 x8 : 0000000000017fe8 
May 11 08:52:14 helios64 kernel: x7 : c0000000ffffefff x6 : 0000000000000003 
May 11 08:52:14 helios64 kernel: x5 : 0000000000000000 x4 : 0000000000000000 
May 11 08:52:14 helios64 kernel: x3 : 0000000000000103 x2 : 0000000000000102 
May 11 08:52:14 helios64 kernel: x1 : c2c1fd0267433900 x0 : 0000000000000000 
May 11 08:52:14 helios64 kernel: Call trace:
May 11 08:52:14 helios64 kernel:  dev_watchdog+0x398/0x3a0
May 11 08:52:14 helios64 kernel:  call_timer_fn+0x30/0x1f8
May 11 08:52:14 helios64 kernel:  run_timer_softirq+0x290/0x540
May 11 08:52:14 helios64 kernel:  efi_header_end+0x160/0x41c
May 11 08:52:14 helios64 kernel:  irq_exit+0xb8/0xd8
May 11 08:52:14 helios64 kernel:  __handle_domain_irq+0x98/0x108
May 11 08:52:14 helios64 kernel:  gic_handle_irq+0xc0/0x140
May 11 08:52:14 helios64 kernel:  el1_irq+0xc0/0x180
May 11 08:52:14 helios64 kernel:  arch_cpu_idle+0x18/0x28
May 11 08:52:14 helios64 kernel:  default_idle_call+0x44/0x1bc
May 11 08:52:14 helios64 kernel:  do_idle+0x204/0x278
May 11 08:52:14 helios64 kernel:  cpu_startup_entry+0x24/0x60
May 11 08:52:14 helios64 kernel:  secondary_start_kernel+0x170/0x180
May 11 08:52:14 helios64 kernel: ---[ end trace 2eddd7bf907ed7b6 ]---
May 11 08:52:14 helios64 kernel: r8152 4-1.4:1.0 eth1: Tx timeout
May 11 08:52:14 helios64 kernel: r8152 4-1.4:1.0 eth1: Tx status -2
May 11 08:52:14 helios64 kernel: r8152 4-1.4:1.0 eth1: Tx status -2
May 11 08:52:14 helios64 kernel: r8152 4-1.4:1.0 eth1: Tx status -2
May 11 08:52:14 helios64 kernel: r8152 4-1.4:1.0 eth1: Tx status -2
May 11 08:52:16 helios64 kernel: usb 4-1.4: reset SuperSpeed Gen 1 USB device number 3 using xhci-hcd

 

Once more, the only fix seems to be not advertising 2.5Gbps speeds on the other end.

Posted

Hi, I'm having the same kernel panic on my helios64 with tx offloading off on 5.10.37 (Armbian 21.08.0-trunk.25 Focal) on the beta branch:

 

[Sat May 15 00:11:52 2021] ------------[ cut here ]------------
[Sat May 15 00:11:52 2021] NETDEV WATCHDOG: eth1 (r8152): transmit queue 0 timed out
[Sat May 15 00:11:52 2021] WARNING: CPU: 4 PID: 0 at net/sched/sch_generic.c:443 dev_watchdog+0x398/0x3a0
[Sat May 15 00:11:52 2021] Modules linked in: nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp aufs ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge governor_performance rfkill zram binfmt_misc zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zcommon(POE) znvpair(POE) zavl(POE) icp(POE) spl(OE) snd_soc_hdmi_codec r8152 snd_soc_rockchip_i2s snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer leds_pwm gpio_charger snd soundcore pwm_fan rockchip_vdec(C) hantro_vpu(C) rockchip_rga v4l2_h264 videobuf2_dma_contig videobuf2_dma_sg v4l2_mem2mem videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev fusb302 tcpm mc typec sg gpio_beeper cpufreq_dt sch_fq_codel nfsd auth_rpcgss nfs_acl lockd grace lm75 sunrpc ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy
[Sat May 15 00:11:52 2021]  async_pq async_xor async_tx raid1 raid0 multipath linear md_mod uas panfrost gpu_sched realtek dwmac_rk stmmac_platform stmmac pcs_xpcs rockchipdrm dw_mipi_dsi dw_hdmi analogix_dp drm_kms_helper cec rc_core drm drm_panel_orientation_quirks adc_keys
[Sat May 15 00:11:52 2021] CPU: 4 PID: 0 Comm: swapper/4 Tainted: P         C OE     5.10.37-rockchip64 #trunk.25
[Sat May 15 00:11:52 2021] Hardware name: Helios64 (DT)
[Sat May 15 00:11:52 2021] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
[Sat May 15 00:11:52 2021] pc : dev_watchdog+0x398/0x3a0
[Sat May 15 00:11:52 2021] lr : dev_watchdog+0x398/0x3a0
[Sat May 15 00:11:52 2021] sp : ffff800011c0bd50
[Sat May 15 00:11:52 2021] x29: ffff800011c0bd50 x28: ffff000008595080 
[Sat May 15 00:11:52 2021] x27: 0000000000000004 x26: 0000000000000140 
[Sat May 15 00:11:52 2021] x25: 00000000ffffffff x24: 0000000000000004 
[Sat May 15 00:11:52 2021] x23: ffff8000118b7000 x22: ffff000008fad41c 
[Sat May 15 00:11:52 2021] x21: ffff000008fad000 x20: ffff000008fad4c0 
[Sat May 15 00:11:52 2021] x19: 0000000000000000 x18: ffff8000118ded68 
[Sat May 15 00:11:52 2021] x17: 0000000000000000 x16: 0000000000000000 
[Sat May 15 00:11:52 2021] x15: 000000000000037f x14: ffff800011c0ba10 
[Sat May 15 00:11:52 2021] x13: 00000000ffffffea x12: ffff80001194eda0 
[Sat May 15 00:11:52 2021] x11: 0000000000000003 x10: ffff800011936d60 
[Sat May 15 00:11:52 2021] x9 : ffff800011936db8 x8 : 0000000000017fe8 
[Sat May 15 00:11:52 2021] x7 : c0000000ffffefff x6 : 0000000000000003 
[Sat May 15 00:11:52 2021] x5 : 0000000000000000 x4 : 0000000000000000 
[Sat May 15 00:11:52 2021] x3 : 0000000000000103 x2 : 0000000000000102 
[Sat May 15 00:11:52 2021] x1 : 0f3a2b07da23a200 x0 : 0000000000000000 
[Sat May 15 00:11:52 2021] Call trace:
[Sat May 15 00:11:52 2021]  dev_watchdog+0x398/0x3a0
[Sat May 15 00:11:52 2021]  call_timer_fn+0x30/0x1f8
[Sat May 15 00:11:52 2021]  run_timer_softirq+0x290/0x540
[Sat May 15 00:11:52 2021]  efi_header_end+0x160/0x41c
[Sat May 15 00:11:52 2021]  irq_exit+0xb8/0xd8
[Sat May 15 00:11:52 2021]  __handle_domain_irq+0x98/0x108
[Sat May 15 00:11:52 2021]  gic_handle_irq+0xc0/0x140
[Sat May 15 00:11:52 2021]  el1_irq+0xc0/0x180
[Sat May 15 00:11:52 2021]  arch_cpu_idle+0x18/0x28
[Sat May 15 00:11:52 2021]  default_idle_call+0x44/0x1bc
[Sat May 15 00:11:52 2021]  do_idle+0x204/0x278
[Sat May 15 00:11:52 2021]  cpu_startup_entry+0x24/0x60
[Sat May 15 00:11:52 2021]  secondary_start_kernel+0x170/0x180
[Sat May 15 00:11:52 2021] ---[ end trace 7803e60dad0442ed ]---
[Sat May 15 00:11:52 2021] r8152 4-1.4:1.0 eth1: Tx timeout
[Sat May 15 00:11:52 2021] xhci-hcd xhci-hcd.0.auto: bad transfer trb length 16743636 in event trb
[Sat May 15 00:11:52 2021] r8152 4-1.4:1.0 eth1: Tx status -2
[Sat May 15 00:11:52 2021] xhci-hcd xhci-hcd.0.auto: bad transfer trb length 16743636 in event trb
[Sat May 15 00:11:52 2021] r8152 4-1.4:1.0 eth1: Tx status -2
[Sat May 15 00:11:52 2021] xhci-hcd xhci-hcd.0.auto: bad transfer trb length 16743636 in event trb
[Sat May 15 00:11:52 2021] r8152 4-1.4:1.0 eth1: Tx status -2
[Sat May 15 00:11:52 2021] r8152 4-1.4:1.0 eth1: Tx status -2
[Sat May 15 00:11:54 2021] usb 4-1.4: reset SuperSpeed Gen 1 USB device number 4 using xhci-hcd
[Sat May 15 00:12:24 2021] r8152 4-1.4:1.0 eth1: Tx timeout
[Sat May 15 00:12:24 2021] xhci-hcd xhci-hcd.0.auto: bad transfer trb length 16729300 in event trb
[Sat May 15 00:12:24 2021] r8152 4-1.4:1.0 eth1: Tx status -2
[Sat May 15 00:12:24 2021] xhci-hcd xhci-hcd.0.auto: bad transfer trb length 16729300 in event trb
[Sat May 15 00:12:24 2021] r8152 4-1.4:1.0 eth1: Tx status -2
[Sat May 15 00:12:24 2021] xhci-hcd xhci-hcd.0.auto: bad transfer trb length 16729300 in event trb
[Sat May 15 00:12:24 2021] r8152 4-1.4:1.0 eth1: Tx status -2
[Sat May 15 00:12:24 2021] xhci-hcd xhci-hcd.0.auto: bad transfer trb length 16729300 in event trb
[Sat May 15 00:12:24 2021] r8152 4-1.4:1.0 eth1: Tx status -2
[Sat May 15 00:12:26 2021] usb 4-1.4: reset SuperSpeed Gen 1 USB device number 4 using xhci-hcd

 

I can upload ~1.77Gbps to the helios64 but downloading from the helios64 it gives this panic.

 

Posted

I did some digging:

  • 2021-03-23: PR 2728 re-enabled the necessary patch in rockchip64-current and rockchip64-dev with some minor changes to the patch itself
  • 2021-03-24: PR 2704 renamed dev to edge, shuffling tons of files around. These changes include the content changes from PR 2728, but don't include the filename change for 5.10
  • Both PRs went into branch v21.05

As of now, commits 6311842d on branch v21.05 and 81d1de32 on branch master, the patch is only being applied for 5.11 kernels, not for 5.10 kernels (via "ag xhci_trb_ent_quirk")

Unfortunately, I can't seem to build a kernel off branch v21.05 on my machine at the moment to verify this as the root cause/fix for this issue - add-csgpio-to-rockchip-spi.patch fails to apply and compilation fails without a real error message...

 

@Igor, @piter75: Could you please take a look into this?

Posted

I didn't notice at first, but this issue was fixed in commit 8eece74e on the master branch (thanks, Piotr!). So everything should just work on beta again. I built my own kernel from master not too long ago and everything is working fine.

Posted
On 6/23/2021 at 1:42 AM, yay said:

I didn't notice at first, but this issue was fixed in commit 8eece74e on the master branch (thanks, Piotr!). So everything should just work on beta again. I built my own kernel from master not too long ago and everything is working fine.

Thanks @Piotr. I've been saturating TX/RX on it all day and no issues. Working great. Thanks @yay

Posted

I may have spoken too soon, doing a non-iperf3 workload, in my case iSCSI LUN traffic over the 2.5gbps, is freezing the xHCI controller:

 

[  662.986623] xhci-hcd xhci-hcd.0.auto: xHCI host not responding to stop endpoint command.
[  662.986645] xhci-hcd xhci-hcd.0.auto: USBSTS:
[  663.000235] xhci-hcd xhci-hcd.0.auto: xHCI host controller not responding, assume dead
[  663.001046] xhci-hcd xhci-hcd.0.auto: HC died; cleaning up
[  663.001732] usb 1-1: USB disconnect, device number 2
[  663.002653] r8152 2-1.4:1.0 eth1: Stop submitting intr, status -108
[  663.002734] r8152 2-1.4:1.0 eth1: get_registers -110
[  663.002810] r8152 2-1.4:1.0 eth1: Tx status -108
[  663.003245] r8152 2-1.4:1.0 eth1: Tx status -108
[  663.003265] r8152 2-1.4:1.0 eth1: Tx status -108
[  663.003285] r8152 2-1.4:1.0 eth1: Tx status -108
[  663.006467] usb 2-1: USB disconnect, device number 2
[  663.006496] usb 2-1.4: USB disconnect, device number 3

 

I'm not sure why iSCSI over iperf3 makes any difference. I had done iperf3 for over 8 hours both ways at the same time.

 

Pastebin of the debug dump minus my zfs datasets: https://pastebin.com/yReZEym1

 

Notably running 5.10.46-rockchip64 #trunk.77, latest on the beta branch at the time of writing (June 30th 2021).

 

I had a 5TB Seagate drive plugged in to the USB controller. I removed that but it still crashed. The only way I know how to fix this is to reboot the helios. Is there a better way to reset it?

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines