yay

  • Posts

    10
  • Joined

  • Last visited

yay's Achievements

  1. I didn't notice at first, but this issue was fixed in commit 8eece74e on the master branch (thanks, Piotr!). So everything should just work on beta again. I built my own kernel from master not too long ago and everything is working fine.
  2. I did some digging: 2021-03-23: PR 2728 re-enabled the necessary patch in rockchip64-current and rockchip64-dev with some minor changes to the patch itself 2021-03-24: PR 2704 renamed dev to edge, shuffling tons of files around. These changes include the content changes from PR 2728, but don't include the filename change for 5.10 Both PRs went into branch v21.05 As of now, commits 6311842d on branch v21.05 and 81d1de32 on branch master, the patch is only being applied for 5.11 kernels, not for 5.10 kernels (via "ag xhci_trb_ent_quirk") Unfortunately, I can't seem to build a kernel off branch v21.05 on my machine at the moment to verify this as the root cause/fix for this issue - add-csgpio-to-rockchip-spi.patch fails to apply and compilation fails without a real error message... @Igor, @piter75: Could you please take a look into this?
  3. This issue reoccurred after upgrading from Armbian 21.02.3 to 21.05.1 with today's updated packages On 21.02.3 with kernel 5.10.21-rockchip64, everything worked flawlessly for months. On Armbian 21.05.1 Buster with kernel 5.10.35-rockchip64, whenever I try and stress the 2.5gbps port, it crashes immediately and results in the following dmesg output: May 11 08:52:14 helios64 kernel: ------------[ cut here ]------------ May 11 08:52:14 helios64 kernel: NETDEV WATCHDOG: eth1 (r8152): transmit queue 0 timed out May 11 08:52:14 helios64 kernel: WARNING: CPU: 4 PID: 0 at net/sched/sch_generic.c:443 dev_watchdog+0x398/0x3a0 May 11 08:52:14 helios64 kernel: Modules linked in: rfkill governor_performance r8152 snd_soc_hdmi_codec snd_soc_rockchip_i2s rockchipdrm snd_soc_core hantro_vpu(C) rockchip_vdec(C) dw_mipi_dsi snd_pcm_dmaengine rockchip_rga dw_hdmi v4l2_h264 snd_pcm analogix_dp videobuf2_dma_contig panfrost v4l2_mem2mem videobuf2_dma_sg videobuf2_vmalloc snd_timer drm_kms_helper videobuf2_memops snd gpu_sched videobuf2_v4l2 pwm_fan cec leds_pwm gpio_charger sg videobuf2_common soundcore rc_core fusb302 videodev drm tcpm mc drm_panel_orientation_quirks typec gpio_beeper cpufreq_dt zram lm75 ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid0 multipath linear raid1 md_mod realtek dwmac_rk stmmac_platform stmmac pcs_xpcs adc_keys May 11 08:52:14 helios64 kernel: CPU: 4 PID: 0 Comm: swapper/4 Tainted: G C 5.10.35-rockchip64 #21.05.1 May 11 08:52:14 helios64 kernel: Hardware name: Helios64 (DT) May 11 08:52:14 helios64 kernel: pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--) May 11 08:52:14 helios64 kernel: pc : dev_watchdog+0x398/0x3a0 May 11 08:52:14 helios64 kernel: lr : dev_watchdog+0x398/0x3a0 May 11 08:52:14 helios64 kernel: sp : ffff800011c0bd50 May 11 08:52:14 helios64 kernel: x29: ffff800011c0bd50 x28: ffff000004a30480 May 11 08:52:14 helios64 kernel: x27: 0000000000000004 x26: 0000000000000140 May 11 08:52:14 helios64 kernel: x25: 00000000ffffffff x24: 0000000000000004 May 11 08:52:14 helios64 kernel: x23: ffff8000118b7000 x22: ffff000004eef41c May 11 08:52:14 helios64 kernel: x21: ffff000004eef000 x20: ffff000004eef4c0 May 11 08:52:14 helios64 kernel: x19: 0000000000000000 x18: ffff8000118ded48 May 11 08:52:14 helios64 kernel: x17: 0000000000000000 x16: 0000000000000000 May 11 08:52:14 helios64 kernel: x15: 0000000000000277 x14: ffff800011c0ba10 May 11 08:52:14 helios64 kernel: x13: 00000000ffffffea x12: ffff80001194ed80 May 11 08:52:14 helios64 kernel: x11: 0000000000000003 x10: ffff800011936d40 May 11 08:52:14 helios64 kernel: x9 : ffff800011936d98 x8 : 0000000000017fe8 May 11 08:52:14 helios64 kernel: x7 : c0000000ffffefff x6 : 0000000000000003 May 11 08:52:14 helios64 kernel: x5 : 0000000000000000 x4 : 0000000000000000 May 11 08:52:14 helios64 kernel: x3 : 0000000000000103 x2 : 0000000000000102 May 11 08:52:14 helios64 kernel: x1 : c2c1fd0267433900 x0 : 0000000000000000 May 11 08:52:14 helios64 kernel: Call trace: May 11 08:52:14 helios64 kernel: dev_watchdog+0x398/0x3a0 May 11 08:52:14 helios64 kernel: call_timer_fn+0x30/0x1f8 May 11 08:52:14 helios64 kernel: run_timer_softirq+0x290/0x540 May 11 08:52:14 helios64 kernel: efi_header_end+0x160/0x41c May 11 08:52:14 helios64 kernel: irq_exit+0xb8/0xd8 May 11 08:52:14 helios64 kernel: __handle_domain_irq+0x98/0x108 May 11 08:52:14 helios64 kernel: gic_handle_irq+0xc0/0x140 May 11 08:52:14 helios64 kernel: el1_irq+0xc0/0x180 May 11 08:52:14 helios64 kernel: arch_cpu_idle+0x18/0x28 May 11 08:52:14 helios64 kernel: default_idle_call+0x44/0x1bc May 11 08:52:14 helios64 kernel: do_idle+0x204/0x278 May 11 08:52:14 helios64 kernel: cpu_startup_entry+0x24/0x60 May 11 08:52:14 helios64 kernel: secondary_start_kernel+0x170/0x180 May 11 08:52:14 helios64 kernel: ---[ end trace 2eddd7bf907ed7b6 ]--- May 11 08:52:14 helios64 kernel: r8152 4-1.4:1.0 eth1: Tx timeout May 11 08:52:14 helios64 kernel: r8152 4-1.4:1.0 eth1: Tx status -2 May 11 08:52:14 helios64 kernel: r8152 4-1.4:1.0 eth1: Tx status -2 May 11 08:52:14 helios64 kernel: r8152 4-1.4:1.0 eth1: Tx status -2 May 11 08:52:14 helios64 kernel: r8152 4-1.4:1.0 eth1: Tx status -2 May 11 08:52:16 helios64 kernel: usb 4-1.4: reset SuperSpeed Gen 1 USB device number 3 using xhci-hcd Once more, the only fix seems to be not advertising 2.5Gbps speeds on the other end.
  4. Nvm, I found PR 2567. With those changes applied, I can cheerfully report 2.35Gbps net TCP throughput via iperf3 in either direction with an MTU of 1500 and tx offloading enabled. Again, thanks so much! One way remains to reproducibly kill the H64's eth1: set the MTU to 9124 on both ends and run iperf in both directions - each with their own server and the other acting as the client. But as the throughput difference is negligible (2.35Gbps vs 2.39Gbps, both really close to the 2.5Gbps maximum), I can safely exclude that use case for myself. I can't recall whether that even worked on kernel 4.4 or whether I even tested it back then. For whatever it's worth, here's the 5.9 log output from when said crash happens: Jan 21 22:51:55 helios64 kernel: r8152 4-1.4:1.0 eth1: Tx status -2 Jan 21 22:51:55 helios64 kernel: r8152 4-1.4:1.0 eth1: Tx status -2 Jan 21 22:51:55 helios64 kernel: r8152 4-1.4:1.0 eth1: Tx status -2 Jan 21 22:51:55 helios64 kernel: r8152 4-1.4:1.0 eth1: Tx status -2
  5. According to your latest blog post: That's awesome to hear, thank you very much for addressing this issue! Is there any way to test this change upfront? I'd be eager to give this a spin with a self-built kernel/image if necessary, but couldn't find anything concrete in the public git repositories so far.
  6. @gprovost My network setup regarding the Helios64: 1: Helios64 eth0 <--> noname noname Gigabit switch <--> my internal network, router etc <--> my desktop machine, onboard 1Gbps ethernet port (enp7s0), Intel i211, kernel module igb. Used for ssh connection to trigger iperf3 to go through eth1. MTU 1500. 2: Helios64 eth1 <--> 5m Cat 6 ethernet cable <--> my desktop machine, 2.5Gbps ethernet port (enp6s0), Realtek RTL8125, vanilla kernel module r8169. Nothing but static IP address assignments from the same subnet, 192.168.64.0/24 (different subnet compared to 1). MTU 1500. My desktop PC has both its ethernet ports on the mainboard itself, no USB devices are involved on that side. The mainboard is an ASUS "ROG STRIX X570-E GAMING" with the latest non-beta BIOS as of 2020-01-11. To toggle the 2.5Gbps speed between H64 eth1 and my desktop's enp6s0, I run the following commands on my desktop: ethtool -s enp6s0 advertise 0x80000000002f # 2.5Gbps ethtool -s enp6s0 advertise 0x3f # 1Gbps Increasing the MTUs for the direct 2.5Gbps connections to 9124 significantly increases the effective H64 -> desktop througput from ~1.5Gbps to ~2Gbps, but it also "kills" the H64's eth1 much quicker, I feel. Then again, because the stability seems completely random to begin with, that may just be my imagination.
  7. Yes, it's off. "ethtool -K eth1 tx off" produced no output and it gave up just a few seconds after the transfer started. Forcing tx offload on and off again (should we really trust the driver to report the current setting?) didn't improve things. Yesterday I fiddled around with disabling autosuspend on the usb ports and devices in /sys in case it's some weird issue like that - no improvement. However, I've found that changing the advertised speeds for autonegotiation on my desktop's side (removing the bit for 2.5Gbps, for example) causes the link to go down for a few seconds and then back up - that's picked up by the Helios64's eth1 and it goes back to a stable 1Gbps connection. So at least eth1 can be resurrected without a full NAS reboot in a few seconds' time.
  8. On the off chance something there's something different from that image that @clostro mentioned to mine, I tried the same with a separate SD card. No change: eth1 with 2.5Gbps broke down within 10 seconds of the Helios64 sending my desktop machine data via iperf3, just as described before. The only thing I changed after setting the root passwd was setting up a static IP address for eth1 via nmcli. So back to 1Gbps it is.
  9. sync; echo 3 > /proc/sys/vm/drop_caches Check Documentation/sysctl/vm.txt for details on drop_caches.
  10. I've been having the same issue of eth1 just not working right on a 2.5Gbps link on kernel 5.9.x. The same applies for a self-built 5.10.1 kernel and also when going back to r8152 2.13.0 instead of the default 2.14.0 (by adapting the revision in build/lib/compilation-prepare.sh). With a 1Gbps connection, everything is perfectly fine. This connection speed was "forced" by running "ethtool -s enp6s0 advertise 0x3f" on the desktop side, taking 2.5Gbps out of the advertised speeds for autonegotiation. I have a direct 5m Cat6 connection from my Helios64 to my desktop machine. The latter has a 2.5Gbps RTL8125 on the mainboard, using the vanilla kernel's r8169 module. MTUs were kept on 1500 for all tests. Running the Helios64 on kernel 4.4.213 (via Armbian_20.11.4_Helios64_buster_legacy_4.4.213.img.xz), I was able to transfer 4TiB back and forth in parallel without any issue: Helios64 -> desktop was running with ~1.7Gbps; desktop -> Helios64 was running at ~2.1Gbps. 1.7Gbps aren't great, but ok enough with tx offload disabled - certainly better than "just" 1Gbps (~933, realistically). One surefire way to kill the Helios64's eth1 is to just start a simple iperf3 run with it sending data to my desktop: "iperf -c <desktop-ip> -t 600". After a few seconds, the speed goes down to 0, the kernel watchdog resets the USB device (as per r8152.c's rtl8152_tx_timeout), it recovers for a few more seconds, and then eth1 is absolutely dead until I reboot the entire NAS. No ping, no more connection, nothing. Here's what I can see via journalctl -f from a parallel connection when such a crash happens: If there are any more logs you'd need or software changes to try and apply, I'd be eager to dig deeper.