Jump to content

Demodude123

Members
  • Posts

    11
  • Joined

  • Last visited

  1. I was wondering if anybody had an STL or the like for the drive sleds. I haven't broken one yet, but was thinking about making some in different colors on my 3D printer if they're available. Thanks.
  2. I may have spoken too soon, doing a non-iperf3 workload, in my case iSCSI LUN traffic over the 2.5gbps, is freezing the xHCI controller: [ 662.986623] xhci-hcd xhci-hcd.0.auto: xHCI host not responding to stop endpoint command. [ 662.986645] xhci-hcd xhci-hcd.0.auto: USBSTS: [ 663.000235] xhci-hcd xhci-hcd.0.auto: xHCI host controller not responding, assume dead [ 663.001046] xhci-hcd xhci-hcd.0.auto: HC died; cleaning up [ 663.001732] usb 1-1: USB disconnect, device number 2 [ 663.002653] r8152 2-1.4:1.0 eth1: Stop submitting intr, status -108 [ 663.002734] r8152 2-1.4:1.0 eth1: get_registers -110 [ 663.002810] r8152 2-1.4:1.0 eth1: Tx status -108 [ 663.003245] r8152 2-1.4:1.0 eth1: Tx status -108 [ 663.003265] r8152 2-1.4:1.0 eth1: Tx status -108 [ 663.003285] r8152 2-1.4:1.0 eth1: Tx status -108 [ 663.006467] usb 2-1: USB disconnect, device number 2 [ 663.006496] usb 2-1.4: USB disconnect, device number 3 I'm not sure why iSCSI over iperf3 makes any difference. I had done iperf3 for over 8 hours both ways at the same time. Pastebin of the debug dump minus my zfs datasets: https://pastebin.com/yReZEym1 Notably running 5.10.46-rockchip64 #trunk.77, latest on the beta branch at the time of writing (June 30th 2021). I had a 5TB Seagate drive plugged in to the USB controller. I removed that but it still crashed. The only way I know how to fix this is to reboot the helios. Is there a better way to reset it?
  3. Thanks @Piotr. I've been saturating TX/RX on it all day and no issues. Working great. Thanks @yay
  4. Any update on this issue? Could really use the 2.5Gbps patch in the beta branch.
  5. Hi, I'm having the same kernel panic on my helios64 with tx offloading off on 5.10.37 (Armbian 21.08.0-trunk.25 Focal) on the beta branch: [Sat May 15 00:11:52 2021] ------------[ cut here ]------------ [Sat May 15 00:11:52 2021] NETDEV WATCHDOG: eth1 (r8152): transmit queue 0 timed out [Sat May 15 00:11:52 2021] WARNING: CPU: 4 PID: 0 at net/sched/sch_generic.c:443 dev_watchdog+0x398/0x3a0 [Sat May 15 00:11:52 2021] Modules linked in: nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp aufs ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge governor_performance rfkill zram binfmt_misc zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zcommon(POE) znvpair(POE) zavl(POE) icp(POE) spl(OE) snd_soc_hdmi_codec r8152 snd_soc_rockchip_i2s snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer leds_pwm gpio_charger snd soundcore pwm_fan rockchip_vdec(C) hantro_vpu(C) rockchip_rga v4l2_h264 videobuf2_dma_contig videobuf2_dma_sg v4l2_mem2mem videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev fusb302 tcpm mc typec sg gpio_beeper cpufreq_dt sch_fq_codel nfsd auth_rpcgss nfs_acl lockd grace lm75 sunrpc ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy [Sat May 15 00:11:52 2021] async_pq async_xor async_tx raid1 raid0 multipath linear md_mod uas panfrost gpu_sched realtek dwmac_rk stmmac_platform stmmac pcs_xpcs rockchipdrm dw_mipi_dsi dw_hdmi analogix_dp drm_kms_helper cec rc_core drm drm_panel_orientation_quirks adc_keys [Sat May 15 00:11:52 2021] CPU: 4 PID: 0 Comm: swapper/4 Tainted: P C OE 5.10.37-rockchip64 #trunk.25 [Sat May 15 00:11:52 2021] Hardware name: Helios64 (DT) [Sat May 15 00:11:52 2021] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--) [Sat May 15 00:11:52 2021] pc : dev_watchdog+0x398/0x3a0 [Sat May 15 00:11:52 2021] lr : dev_watchdog+0x398/0x3a0 [Sat May 15 00:11:52 2021] sp : ffff800011c0bd50 [Sat May 15 00:11:52 2021] x29: ffff800011c0bd50 x28: ffff000008595080 [Sat May 15 00:11:52 2021] x27: 0000000000000004 x26: 0000000000000140 [Sat May 15 00:11:52 2021] x25: 00000000ffffffff x24: 0000000000000004 [Sat May 15 00:11:52 2021] x23: ffff8000118b7000 x22: ffff000008fad41c [Sat May 15 00:11:52 2021] x21: ffff000008fad000 x20: ffff000008fad4c0 [Sat May 15 00:11:52 2021] x19: 0000000000000000 x18: ffff8000118ded68 [Sat May 15 00:11:52 2021] x17: 0000000000000000 x16: 0000000000000000 [Sat May 15 00:11:52 2021] x15: 000000000000037f x14: ffff800011c0ba10 [Sat May 15 00:11:52 2021] x13: 00000000ffffffea x12: ffff80001194eda0 [Sat May 15 00:11:52 2021] x11: 0000000000000003 x10: ffff800011936d60 [Sat May 15 00:11:52 2021] x9 : ffff800011936db8 x8 : 0000000000017fe8 [Sat May 15 00:11:52 2021] x7 : c0000000ffffefff x6 : 0000000000000003 [Sat May 15 00:11:52 2021] x5 : 0000000000000000 x4 : 0000000000000000 [Sat May 15 00:11:52 2021] x3 : 0000000000000103 x2 : 0000000000000102 [Sat May 15 00:11:52 2021] x1 : 0f3a2b07da23a200 x0 : 0000000000000000 [Sat May 15 00:11:52 2021] Call trace: [Sat May 15 00:11:52 2021] dev_watchdog+0x398/0x3a0 [Sat May 15 00:11:52 2021] call_timer_fn+0x30/0x1f8 [Sat May 15 00:11:52 2021] run_timer_softirq+0x290/0x540 [Sat May 15 00:11:52 2021] efi_header_end+0x160/0x41c [Sat May 15 00:11:52 2021] irq_exit+0xb8/0xd8 [Sat May 15 00:11:52 2021] __handle_domain_irq+0x98/0x108 [Sat May 15 00:11:52 2021] gic_handle_irq+0xc0/0x140 [Sat May 15 00:11:52 2021] el1_irq+0xc0/0x180 [Sat May 15 00:11:52 2021] arch_cpu_idle+0x18/0x28 [Sat May 15 00:11:52 2021] default_idle_call+0x44/0x1bc [Sat May 15 00:11:52 2021] do_idle+0x204/0x278 [Sat May 15 00:11:52 2021] cpu_startup_entry+0x24/0x60 [Sat May 15 00:11:52 2021] secondary_start_kernel+0x170/0x180 [Sat May 15 00:11:52 2021] ---[ end trace 7803e60dad0442ed ]--- [Sat May 15 00:11:52 2021] r8152 4-1.4:1.0 eth1: Tx timeout [Sat May 15 00:11:52 2021] xhci-hcd xhci-hcd.0.auto: bad transfer trb length 16743636 in event trb [Sat May 15 00:11:52 2021] r8152 4-1.4:1.0 eth1: Tx status -2 [Sat May 15 00:11:52 2021] xhci-hcd xhci-hcd.0.auto: bad transfer trb length 16743636 in event trb [Sat May 15 00:11:52 2021] r8152 4-1.4:1.0 eth1: Tx status -2 [Sat May 15 00:11:52 2021] xhci-hcd xhci-hcd.0.auto: bad transfer trb length 16743636 in event trb [Sat May 15 00:11:52 2021] r8152 4-1.4:1.0 eth1: Tx status -2 [Sat May 15 00:11:52 2021] r8152 4-1.4:1.0 eth1: Tx status -2 [Sat May 15 00:11:54 2021] usb 4-1.4: reset SuperSpeed Gen 1 USB device number 4 using xhci-hcd [Sat May 15 00:12:24 2021] r8152 4-1.4:1.0 eth1: Tx timeout [Sat May 15 00:12:24 2021] xhci-hcd xhci-hcd.0.auto: bad transfer trb length 16729300 in event trb [Sat May 15 00:12:24 2021] r8152 4-1.4:1.0 eth1: Tx status -2 [Sat May 15 00:12:24 2021] xhci-hcd xhci-hcd.0.auto: bad transfer trb length 16729300 in event trb [Sat May 15 00:12:24 2021] r8152 4-1.4:1.0 eth1: Tx status -2 [Sat May 15 00:12:24 2021] xhci-hcd xhci-hcd.0.auto: bad transfer trb length 16729300 in event trb [Sat May 15 00:12:24 2021] r8152 4-1.4:1.0 eth1: Tx status -2 [Sat May 15 00:12:24 2021] xhci-hcd xhci-hcd.0.auto: bad transfer trb length 16729300 in event trb [Sat May 15 00:12:24 2021] r8152 4-1.4:1.0 eth1: Tx status -2 [Sat May 15 00:12:26 2021] usb 4-1.4: reset SuperSpeed Gen 1 USB device number 4 using xhci-hcd I can upload ~1.77Gbps to the helios64 but downloading from the helios64 it gives this panic.
  6. Demodude123

    ZFS on Helios64

    zfs-2.0.0.tar.gz zfs-kmod-2.0.0-rc5.src.rpm kmod-zfs-5.8.14-rockchip64_2.0.0-0_arm64.deb kmod-zfs-devel_2.0.0-0_arm64.deb kmod-zfs-devel-5.8.14-rockchip64_2.0.0-0_arm64.deb zfs-dkms-2.0.0-rc5.src.rpm zfs-dkms_2.0.0-0_arm64.deb zfs-2.0.0-rc5.src.rpm zfs_2.0.0-0_arm64.deb libnvpair1_2.0.0-0_arm64.deb libuutil1_2.0.0-0_arm64.deb libzfs2_2.0.0-0_arm64.deb libzpool2_2.0.0-0_arm64.deb libzfs2-devel_2.0.0-0_arm64.deb zfs-test_2.0.0-0_arm64.deb zfs-dracut_2.0.0-0_arm64.deb zfs-initramfs_2.0.0-0_arm64.deb python3-pyzfs_2.0.0-0_arm64.deb Here's what it built for me
  7. Demodude123

    ZFS on Helios64

    I also have zfs-2.0-rc5 running. I tried updating to rc7 and had some trouble with the deb packages. It's probably fixed in the GA build. I used this guide: https://openzfs.github.io/openzfs-docs/Developer Resources/Building ZFS.html - and then `make deb` gives you debs you can install with `dpkg -i`. I installed the shared libs, zfs-dkms, and zfsutils from here. As for systemd, you have to play around with it to get it to start on boot. Usually I can `sudo systemctl unmask <zfs-thing>`. Another good double check is to check /lib/systemd/system and /etc/systemd/system on a normal debian-based system and compare what's different. Make sure zfs.target is specified under the multi-user target! If you use sanoid/synoid, it's a bit of a pain. It has debian's zfsutils as a dependency. You're probably better off downloading this off of git and setting up the cron yourself. Otherwise, you have to modify your apt cache to remove zfsutils as a dependency for sanoid. I'm using zstd compression and have no complaints so far.
  8. I think I had a double issue. I have a realtek card from the same family in the desktop I am testing. I think my desktop needs the https://github.com/igorpecovnik/realtek-r8152-linux patch. For now, I've switched cables around so I can use the 2.5Gps port at 100MBps at our home lan, and the 1Gps port directly into my motherboard port. The 2.5G card in the desktop is a pcie card. I'll keep it this way until the TX checksum offload issue is resolved. Thanks for your help.
  9. I was able to recreate this with the latest test build: Armbian_20.11.0-trunk.32_Helios64_focal_current_5.9.8.img.xz - and put it on an SD card. I did compile and install zfs, but then tried iperf3 tx. Here's my armbianmonitor -u stats. It includes a kernel panic in dmesg against the r8152 driver. After which, the iperf3 drops to 0bits, restarts the NIC, the transfer continues, but keeps crashing over and over, until it gives up. http://ix.io/2F07
  10. I see the issue in all 'USB'-backed combos. It happens on the 2.5Gbps linked at 2.5Gbps, the 2.5Gbps linked at 1Gbps, and to a startech usb3 1Gbps plugged into any port on the helios, linked at 1Gbps. My tower has a PCIe 2.5GBps and a 1Gbps built into the motherboard. I saw this result with both. My tower runs Ubuntu 20.04 stable with the 5.4 kernel. I've never had network issues with it. Since the Helios64 2.5Gbps is run off the USB, that leads me to believe there's something wrong with the kernel/usb driver. (since both the USB backed 2.5Gbps and an external USB Startech 1Gbps have the issue). For what it's worth - I have not performed the 1Gbps performance fix mentioned in the latest blog. You should be able to recreate with a few iperf3's back to back. One direction will always succeed, pulling nearly 2.5Gbps, and the other will fail after a few tries.
  11. I'm on the Ubuntu focal release for the Helios64: Kernel: 5.8.14-rockchip64 If I try transferring between the 2.5Gbps port and my computer, either linked at 2.5Gbps or 1Gbps, will unlink. I tried with a StarTech USB 3.0 port, linking at 1Gbps, and the same issue occurs, leading me to believe it's a software issue with the USB driver. lsusb -t, with the startech plugged in: root@helios64:/helios# lsusb -t /: Bus 06.Port 1: Dev 1, Class=root_hub, Driver=ohci-platform/1p, 12M /: Bus 05.Port 1: Dev 1, Class=root_hub, Driver=ehci-platform/1p, 480M /: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M /: Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 5000M |__ Port 1: Dev 5, If 0, Class=Vendor Specific Class, Driver=ax88179_178a, 5000M |__ Port 2: Dev 4, If 0, Class=Mass Storage, Driver=uas, 5000M |__ Port 4: Dev 3, If 0, Class=Vendor Specific Class, Driver=r8152, 5000M /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M root@helios64:/helios# [0] 0:bash- 1:bash* What I see with iperf: │Accepted connection from 192.168.1.20, port 56104 iperf Done. │[ 5] local 192.168.1.10 port 5201 connected to 192.168.1.20 port 56106 justin@justin-3900x:~$ iperf3 -c 192.168.1.10 │[ ID] Interval Transfer Bitrate Connecting to host 192.168.1.10, port 5201 │[ 5] 0.00-1.00 sec 251 MBytes 2.10 Gbits/sec [ 5] local 192.168.1.20 port 56106 connected to 192.168.1.10 port 5201 │[ 5] 1.00-2.00 sec 281 MBytes 2.36 Gbits/sec [ ID] Interval Transfer Bitrate Retr Cwnd │[ 5] 2.00-3.00 sec 276 MBytes 2.32 Gbits/sec [ 5] 0.00-1.00 sec 258 MBytes 2.16 Gbits/sec 0 1.48 MBytes │[ 5] 3.00-4.00 sec 275 MBytes 2.30 Gbits/sec [ 5] 1.00-2.00 sec 278 MBytes 2.33 Gbits/sec 0 1.48 MBytes │[ 5] 4.00-5.00 sec 281 MBytes 2.36 Gbits/sec [ 5] 2.00-3.00 sec 276 MBytes 2.32 Gbits/sec 0 1.48 MBytes │[ 5] 5.00-6.00 sec 281 MBytes 2.35 Gbits/sec [ 5] 3.00-4.00 sec 278 MBytes 2.33 Gbits/sec 0 1.48 MBytes │[ 5] 6.00-7.00 sec 281 MBytes 2.35 Gbits/sec [ 5] 4.00-5.00 sec 278 MBytes 2.33 Gbits/sec 0 1.48 MBytes │[ 5] 7.00-8.00 sec 281 MBytes 2.35 Gbits/sec [ 5] 5.00-6.00 sec 281 MBytes 2.36 Gbits/sec 0 1.48 MBytes │[ 5] 8.00-9.00 sec 276 MBytes 2.31 Gbits/sec [ 5] 6.00-7.00 sec 280 MBytes 2.35 Gbits/sec 0 1.48 MBytes │[ 5] 9.00-10.00 sec 268 MBytes 2.25 Gbits/sec [ 5] 7.00-8.00 sec 281 MBytes 2.36 Gbits/sec 0 1.48 MBytes │[ 5] 10.00-10.00 sec 395 KBytes 2.40 Gbits/sec [ 5] 8.00-9.00 sec 279 MBytes 2.34 Gbits/sec 0 1.48 MBytes │- - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 9.00-10.00 sec 265 MBytes 2.22 Gbits/sec 0 1.48 MBytes │[ ID] Interval Transfer Bitrate - - - - - - - - - - - - - - - - - - - - - - - - - │[ 5] 0.00-10.00 sec 2.69 GBytes 2.31 Gbits/sec receiver [ ID] Interval Transfer Bitrate Retr │----------------------------------------------------------- [ 5] 0.00-10.00 sec 2.69 GBytes 2.31 Gbits/sec 0 sender │Server listening on 5201 [ 5] 0.00-10.00 sec 2.69 GBytes 2.31 Gbits/sec receiver │----------------------------------------------------------- │Accepted connection from 192.168.1.20, port 56110 iperf Done. │[ 5] local 192.168.1.10 port 5201 connected to 192.168.1.20 port 56112 justin@justin-3900x:~$ iperf3 -c 192.168.1.10 -R │[ ID] Interval Transfer Bitrate Retr Cwnd Connecting to host 192.168.1.10, port 5201 │[ 5] 0.00-1.00 sec 153 MBytes 1.29 Gbits/sec 0 1.41 KBytes Reverse mode, remote host 192.168.1.10 is sending │[ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes [ 5] local 192.168.1.20 port 56112 connected to 192.168.1.10 port 5201 │[ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes [ ID] Interval Transfer Bitrate │[ 5] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes [ 5] 0.00-1.00 sec 150 MBytes 1.26 Gbits/sec │[ 5] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes [ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec │[ 5] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes [ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec │[ 5] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes [ 5] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec │[ 5] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes [ 5] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec │[ 5] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes [ 5] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec │[ 5] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes [ 5] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec │- - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec │[ ID] Interval Transfer Bitrate Retr [ 5] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec │[ 5] 0.00-10.00 sec 153 MBytes 129 Mbits/sec 0 sender │ sudo ethtool -K eth1 tx off - this does lower the speed as expected, but it will still unlink after some time. I've also had times where iperf3 works fine, but then doing a smb share, it lasts for just a few seconds before the link dies. Any idea? If it's happening to both the startech and the internal realtek usb, I'm not sure where the software issue is... armbianmonitor -u http://ix.io/2Etm
×
×
  • Create New...