Jump to content

Helios64 network bonding


robrob

Recommended Posts

Hi guys,

 

I have been using my Helios64 for more than a year now. I have it connected with the 1Gbps NIC. I have been reading about network bonding and getting both NICs up and running. The main purpose would be to have it running in mode "Adaptive Transmit Load Balancing" (mode 5).

First I've discovered the issue with the 2.5Gbps not being able to keep a consistent (and performant) speed. So I've soldered the missing pin from the capacitor to the NIC (more on this later). I've checked the speed and now it is up to specs, getting very close to 1000Mbps (960-980Mbps).

 

But here it comes the problem that I'm facing, if I enable both independently I do not get the 2.5Gbps to even appear all the time... at some point I did get it to work and that is when I performed the test of speed.

I'm running current (stable) 5.10.63-rockchip64 #21.08.2 (and I have tried with no more success 5.13.x and 5.14.x)

 

On the logs I find these lines that are a good idea that the 2.5Gbps NIC is having some issues:

[50618.607406] usb 2-1.4: reset SuperSpeed Gen 1 USB device number 96 using xhci-hcd
[50618.735965] r8152 2-1.4:1.0 eth1: v2.15.0 (2021/04/15)
[50618.735975] r8152 2-1.4:1.0 eth1: This product is covered by one or more of the following patents:
                       US6,570,884, US6,115,776, and US6,327,625.
[50675.867182] r8152 2-1.4:1.0 eth1: get_registers -19
[50675.867653] r8152 2-1.4:1.0 eth1: Get ether addr fail

So:

1.- has someone else seen these issues in the logs? If yes, has anyone been able to find a solution?

2.- the soldering went well, but the cable I used is a very small one, "hook-up calbe 32 AWG" (will try to update with a pic), could it be this the source of trouble? can both NICs run at the same time? (electric constrains, power rails limit, etc). Has anyone else attempted the soldering fix? Anyone can share a picture of their working setup?

 

Thanks!

Rob

Link to comment
Share on other sites

H,
I'm also facing issues teaming and bonding both NICs in Helios64. Similar configuration works on Helios4 (second NIC added vie USB 3.0 port).
Once 1Gb link sets the speed it crashes and can't be set up:
 

$ dmesg
[ 3097.357372] rk_gmac-dwmac fe300000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
[ 3097.357423] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 3097.372171] rk_gmac-dwmac fe300000.ethernet eth0: Link is Down
[ 3097.379293] bond0: (slave eth0): Error: Device is in use and cannot be enslaved
[ 3097.412528] rk_gmac-dwmac fe300000.ethernet eth0: PHY [stmmac-0:00] driver [RTL8211F Gigabit Ethernet] (irq=POLL)
[ 3097.625230] rk_gmac-dwmac fe300000.ethernet: Failed to reset the dma
[ 3097.625246] rk_gmac-dwmac fe300000.ethernet eth0: stmmac_hw_setup: DMA engine initialization failed
[ 3097.625257] rk_gmac-dwmac fe300000.ethernet eth0: stmmac_open: Hw setup failed
[ 3097.676541] rk_gmac-dwmac fe300000.ethernet eth0: PHY [stmmac-0:00] driver [RTL8211F Gigabit Ethernet] (irq=POLL)
[ 3097.885517] rk_gmac-dwmac fe300000.ethernet: Failed to reset the dma
[ 3097.885538] rk_gmac-dwmac fe300000.ethernet eth0: stmmac_hw_setup: DMA engine initialization failed
[ 3097.885554] rk_gmac-dwmac fe300000.ethernet eth0: stmmac_open: Hw setup failed
[ 3097.944560] rk_gmac-dwmac fe300000.ethernet eth0: PHY [stmmac-0:00] driver [RTL8211F Gigabit Ethernet] (irq=POLL)
[ 3098.156236] rk_gmac-dwmac fe300000.ethernet: Failed to reset the dma
[ 3098.156253] rk_gmac-dwmac fe300000.ethernet eth0: stmmac_hw_setup: DMA engine initialization failed
[ 3098.156264] rk_gmac-dwmac fe300000.ethernet eth0: stmmac_open: Hw setup failed
[ 3098.204543] rk_gmac-dwmac fe300000.ethernet eth0: PHY [stmmac-0:00] driver [RTL8211F Gigabit Ethernet] (irq=POLL)
[ 3098.410564] rk_gmac-dwmac fe300000.ethernet: Failed to reset the dma
[ 3098.410581] rk_gmac-dwmac fe300000.ethernet eth0: stmmac_hw_setup: DMA engine initialization failed
[ 3098.410592] rk_gmac-dwmac fe300000.ethernet eth0: stmmac_open: Hw setup failed
[ 3098.556619] rk_gmac-dwmac fe300000.ethernet eth0: PHY [stmmac-0:00] driver [RTL8211F Gigabit Ethernet] (irq=POLL)
[ 3098.768443] rk_gmac-dwmac fe300000.ethernet: Failed to reset the dma
[ 3098.768459] rk_gmac-dwmac fe300000.ethernet eth0: stmmac_hw_setup: DMA engine initialization failed
[ 3098.768465] rk_gmac-dwmac fe300000.ethernet eth0: stmmac_open: Hw setup failed

Looking at https://wiki.kobol.io/helios64/ethernet
 

Quote

LAN 1 is the native Gigabit Ethernet interface from SoC RK3399. The interfaces is exposed through the Ethernet transceiver RTL8211F connected to RK3399 via RGMII.


 

$ ethtool -i eth0
driver: st_gmac
version: Jan_2016
firmware-version: 
expansion-rom-version: 
bus-info: 
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no

 

$ lshw
*-network:0
       description: Ethernet interface
       physical id: 7
       logical name: eth0
       serial: 64:62:FF:FF:FF:FF
       capacity: 1Gbit/s
       capabilities: ethernet physical tp mii 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
       configuration: autonegotiation=on broadcast=yes driver=st_gmac driverversion=Jan_2016 link=no multicast=yes port=twisted pair


 

$ ethtool eth0
Settings for eth0:
        Supported ports: [ TP MII ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Advertised pause frame use: Symmetric Receive-only
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Speed: Unknown!
        Duplex: Unknown! (255)
        Port: Twisted Pair
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        MDI-X: Unknown
Cannot get wake-on-lan settings: Operation not permitted
        Current message level: 0x0000003f (63)
                               drv probe link timer ifdown ifup
        Link detected: no

 

$ uname -a
Linux nas 5.10.63-rockchip64 #21.08.2 SMP PREEMPT Wed Sep 8 10:57:23 UTC 2021 aarch64 GNU/Linux


Is it a kernel bug? If yes, how to debug it further?

Link to comment
Share on other sites

Problem looks very similar like on Odroid N2 SBC with the same SoC RK3399
 

[    3.946307] kernel: meson8b-dwmac ff3f0000.ethernet: IRQ eth_wake_irq not found
[    3.947735] kernel: meson8b-dwmac ff3f0000.ethernet: IRQ eth_lpi not found
[    3.959955] kernel: meson8b-dwmac ff3f0000.ethernet: PTP uses main clock
[    3.960322] kernel: meson8b-dwmac ff3f0000.ethernet: no reset control found
[    3.987067] kernel: meson8b-dwmac ff3f0000.ethernet: User ID: 0x11, Synopsys ID: 0x37
[    3.988569] kernel: meson8b-dwmac ff3f0000.ethernet:         DWMAC1000
[    3.993813] kernel: meson8b-dwmac ff3f0000.ethernet: DMA HW capability register supported
[    4.008458] kernel: meson8b-dwmac ff3f0000.ethernet: RX Checksum Offload Engine supported
[    4.017824] kernel: meson8b-dwmac ff3f0000.ethernet: COE Type 2
[    4.038408] kernel: meson8b-dwmac ff3f0000.ethernet: TX Checksum insertion supported
[    4.045460] kernel: meson8b-dwmac ff3f0000.ethernet: Wake-Up On Lan supported
[    4.056526] kernel: meson8b-dwmac ff3f0000.ethernet: Normal descriptors
[    4.070716] kernel: meson8b-dwmac ff3f0000.ethernet: Ring mode enabled
[    4.076492] kernel: meson8b-dwmac ff3f0000.ethernet: Enable RX Mitigation via HW Watchdog Timer
[    4.076496] kernel: meson8b-dwmac ff3f0000.ethernet: device MAC address 00:11:22:33:44:FF
[   97.207023] kernel: meson8b-dwmac ff3f0000.ethernet eth0: PHY [0.0:00] driver [RTL8211F Gigabit Ethernet] (irq=42)
[   97.209362] kernel: meson8b-dwmac ff3f0000.ethernet eth0: No Safety Features support found
[   97.209373] kernel: meson8b-dwmac ff3f0000.ethernet eth0: PTP not supported by HW
[   97.209582] kernel: meson8b-dwmac ff3f0000.ethernet eth0: configuring for phy/rgmii link mode
[  100.670345] kernel: meson8b-dwmac ff3f0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
[  100.677458] kernel: meson8b-dwmac ff3f0000.ethernet eth0: Link is Down
[  100.819033] kernel: meson8b-dwmac ff3f0000.ethernet eth0: PHY [0.0:00] driver [RTL8211F Gigabit Ethernet] (irq=42)
[  100.861699] kernel: meson8b-dwmac ff3f0000.ethernet eth0: No Safety Features support found
[  100.861714] kernel: meson8b-dwmac ff3f0000.ethernet eth0: PTP not supported by HW
[  100.861727] kernel: meson8b-dwmac ff3f0000.ethernet eth0: configuring for phy/rgmii link mode

 

If I manually try to activate a bit different USB card on Helios64 I got this:
 

# nmcli conn up Team0
Connection successfully activated (master waiting for slaves) (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/8)
root@helios64:~# [179508.201299] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[179508.201798] Modules linked in: macvlan veth nf_conntrack_netlink xfrm_user xfrm_algo br_netfilter bridge aufs team_mode_loadbalance team governor_performance rfkill cdc_ether usbnet xt_conntrack nft_counter nft_chain_nat xt_nat xt_tcpudp zram xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype nft_compat nf_tables nfnetlink r8152 snd_soc_hdmi_codec ftdi_sio usbserial snd_soc_rockchip_i2s snd_soc_core gpio_charger rockchip_vdec(C) snd_pcm_dmaengine hantro_vpu(C) rockchipdrm snd_pcm leds_pwm pwm_fan dw_mipi_dsi snd_timer panfrost rockchip_rga v4l2_h264 dw_hdmi snd videobuf2_dma_contig analogix_dp soundcore videobuf2_dma_sg gpu_sched v4l2_mem2mem fusb302 drm_kms_helper videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 tcpm videobuf2_common cec videodev rc_core typec mc drm sg drm_panel_orientation_quirks gpio_beeper cpufreq_dt nfsd auth_rpcgss nfs_acl lockd grace ledtrig_netdev lm75 sunrpc ip_tables x_tables autofs4 raid10 raid1 raid0 multipath linear raid456
[179508.201952]  async_raid6_recov async_memcpy async_pq async_xor async_tx md_mod uas realtek dwmac_rk stmmac_platform stmmac pcs_xpcs adc_keys
[179508.210617] CPU: 4 PID: 1029 Comm: kworker/4:42 Tainted: G         C        5.10.63-rockchip64 #21.08.2
[179508.211445] Hardware name: Helios64 (DT)
[179508.211809] Workqueue: usb_hub_wq hub_event
[179508.212187] pstate: 00000005 (nzcv daif -PAN -UAO -TCO BTYPE=--)
[179508.212732] pc : rtl8152_post_reset+0x1d4/0x1e0 [r8152]
[179508.213201] lr : rtl8152_post_reset+0x3c/0x1e0 [r8152]
[179508.213658] sp : ffff80001958bbd0
[179508.213959] x29: ffff80001958bbd0 x28: ffff00000754f800 
[179508.214435] x27: 0000000000000010 x26: 0000000000000000 
[179508.214911] x25: ffff000007610b20 x24: 00000000ffffffed 
[179508.215387] x23: 0000000000000000 x22: ffff000007885000 
[179508.215863] x21: ffff000007464800 x20: ffff8000118b9948 
[179508.216339] x19: ffff000007885980 x18: ffff8000118dee10 
[179508.216815] x17: 0000000000000000 x16: 0000000000000000 
[179508.217291] x15: 0000000000000428 x14: ffff80001958b790 
[179508.217765] x13: 00000000ffffffea x12: ffff80001194ee48 
[179508.218241] x11: 0000000000000003 x10: ffff800011936e08 
[179508.218716] x9 : ffff800011936e60 x8 : 0000000000017fe8 
[179508.219192] x7 : c0000000ffffefff x6 : 0000000000000001 
[179508.219668] x5 : 0000000000000001 x4 : 0000000000000000 
[179508.220144] x3 : 0000000000000001 x2 : b9eafe8d9839b500 
[179508.220619] x1 : 0000000000000000 x0 : 0000000000000010 
[179508.221096] Call trace:
[179508.221324]  rtl8152_post_reset+0x1d4/0x1e0 [r8152]
[179508.221761]  usb_reset_device+0x128/0x258
[179508.222122]  hub_event+0xe94/0x1618
[179508.222439]  process_one_work+0x1ec/0x4d0
[179508.222801]  worker_thread+0x48/0x478
[179508.223133]  kthread+0x140/0x150
[179508.223428]  ret_from_fork+0x10/0x34
[179508.223752] Code: d2800100 910042a3 14009623 17ffffdc (d4210000) 
[179508.224297] ---[ end trace 087e94c7a84ef6dd ]---
[179508.224711] note: kworker/4:42[1029] exited with preempt_count 1
[179508.225383] ------------[ cut here ]------------
[179508.225812] WARNING: CPU: 4 PID: 0 at kernel/rcu/tree.c:624 rcu_eqs_enter.isra.63+0x138/0x140
[179508.226564] Modules linked in: macvlan veth nf_conntrack_netlink xfrm_user xfrm_algo br_netfilter bridge aufs team_mode_loadbalance team governor_performance rfkill cdc_ether usbnet xt_conntrack nft_counter nft_chain_nat xt_nat xt_tcpudp zram xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype nft_compat nf_tables nfnetlink r8152 snd_soc_hdmi_codec ftdi_sio usbserial snd_soc_rockchip_i2s snd_soc_core gpio_charger rockchip_vdec(C) snd_pcm_dmaengine hantro_vpu(C) rockchipdrm snd_pcm leds_pwm pwm_fan dw_mipi_dsi snd_timer panfrost rockchip_rga v4l2_h264 dw_hdmi snd videobuf2_dma_contig analogix_dp soundcore videobuf2_dma_sg gpu_sched v4l2_mem2mem fusb302 drm_kms_helper videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 tcpm videobuf2_common cec videodev rc_core typec mc drm sg drm_panel_orientation_quirks gpio_beeper cpufreq_dt nfsd auth_rpcgss nfs_acl lockd grace ledtrig_netdev lm75 sunrpc ip_tables x_tables autofs4 raid10 raid1 raid0 multipath linear raid456
[179508.226715]  async_raid6_recov async_memcpy async_pq async_xor async_tx md_mod uas realtek dwmac_rk stmmac_platform stmmac pcs_xpcs adc_keys
[179508.235375] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G      D  C        5.10.63-rockchip64 #21.08.2
[179508.236156] Hardware name: Helios64 (DT)
[179508.236511] pstate: 20000085 (nzCv daIf -PAN -UAO -TCO BTYPE=--)
[179508.237046] pc : rcu_eqs_enter.isra.63+0x138/0x140
[179508.237474] lr : rcu_eqs_enter.isra.63+0x1c/0x140
[179508.237894] sp : ffff800011cebf10
[179508.238194] x29: ffff800011cebf10 x28: 0000000000000000 
[179508.238671] x27: 0000000000000000 x26: ffff000000710e80 
[179508.239147] x25: 0000000000000000 x24: ffff800011307b40 
[179508.239623] x23: ffff80001157e978 x22: ffff8000118b9948 
[179508.240097] x21: ffff8000118ba2e8 x20: ffff8000118b99c8 
[179508.240573] x19: ffff800011580a40 x18: 0000000000000004 
[179508.241049] x17: 0000000000000001 x16: 0000000000000019 
[179508.241524] x15: ffff8000118da498 x14: 000000000000015a 
[179508.242000] x13: 0000000000000000 x12: 0000000000000001 
[179508.242476] x11: 0000000000000040 x10: ffff8000118d9c98 
[179508.242952] x9 : ffff8000118d9c90 x8 : ffff000000800028 
[179508.243428] x7 : 0000000000000000 x6 : 000003eb30cab0ae 
[179508.243904] x5 : 00ffffffffffffff x4 : ffff8000e6226000 
[179508.244380] x3 : 0000000000000001 x2 : 4000000000000000 
[179508.244854] x1 : 4000000000000002 x0 : ffff0000f77a6a40 
[179508.245330] Call trace:
[179508.245556]  rcu_eqs_enter.isra.63+0x138/0x140
[179508.245956]  rcu_idle_enter+0x10/0x20
[179508.246286]  default_idle_call+0x40/0x1bc
[179508.246646]  do_idle+0x204/0x278
[179508.246939]  cpu_startup_entry+0x24/0x60
[179508.247295]  secondary_start_kernel+0x168/0x178
[179508.247700] ---[ end trace 087e94c7a84ef6de ]---

 

Link to comment
Share on other sites

Hi fri.K,

it definitively looks like a driver issue (something goes wrong in rtl8152_post_reset)... I will try to debug further based on your information.

I'm also wondering if there's an issue with power consumption when using both NIC (attached the image of my fix for 1Gbps)... cable size (AWG) and length could be an issue?

Will try to test also an external USB-RTL8152 dongle to see if the same driver works.

 

x0ejpt4.jpg

Link to comment
Share on other sites

I have been trying for a while now the edge kernels and I have found that after the HW-mod (above) the network bonding has started to work... not consistently but once setup, the next reboot will actually enable the full network bonding. Currently I'm running 5.15.4, had to rebuild ZFS but that was it.


Packages I've upgraded:
- linux-dtb-edge-rockchip64/focal 21.11.0-trunk.65 arm64
- linux-headers-edge-rockchip64/focal 21.11.0-trunk.65 arm64
- linux-image-edge-rockchip64/focal 21.11.0-trunk.65 arm64


- Package: zfs-dkms
  - Version: 2.1.1-0york0~20.04
  - Priority: optional
  - Section: kernel
  - Source: zfs-linux
 

$ cat /proc/net/bonding/nm-bond0                                                             
Ethernet Channel Bonding Driver: v5.15.4-rockchip64

Bonding Mode: transmit load balancing
Primary Slave: None
Currently Active Slave: eth1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

Slave Interface: eth1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 64:62:66:xx:xx:xx
Slave queue ID: 0

Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 64:62:66:xx:xx:xx
Slave queue ID: 0

 

Link to comment
Share on other sites

On 10/1/2021 at 7:57 PM, flower said:

if you just buy one 2.5GBe switch you can use only one interface and get more speed than with 2x1GBe

 

@flowerThe 2.5G interface (eth1) indeed operates absolutely reliable if connected to a 2.5G switch - even without the hardware mod and using current Armbian Bullseye (kernel downgraded to 5.10.43). My interfaces are managed by systemd-networkd - the config files for the bridged setup are attached below. With this configuration the interfaces receive ipv4 and link local ipv6 addresses. Net transfer rates are around 255 MBytes/s (a nice alternative to bonding two 1Gig Ethernet interfaces). 

 

Armbian 21.08.6 Bullseye with Linux 5.10.43-rockchip64

# cd /etc/systemd/network/
# ls -la
total 20
drwxr-xr-x 2 root root 4096 Nov 16 14:42 .
drwxr-xr-x 6 root root 4096 Nov 26 07:00 ..
-rw-r--r-- 1 root root   30 Nov 26  2020 br0.netdev
-rw-r--r-- 1 root root  233 Oct 10 19:27 br0.network
-rw-r--r-- 1 root root   40 Nov 16 14:42 eth0.network

# cat  br0.netdev
[NetDev]
Name=br0
Kind=bridge

# cat  br0.network
[Match]
Name=br0

[Network]
IPForward=yes
DHCP=no
Address=192.168.xxx.xxx/24
Gateway=192.168.xxx.xxx
DNS=192.168.xxx.xxx

# cat eth0.network
[Match]
Name=eth1

[Network]
Bridge=br0

# networkctl
IDX LINK       TYPE     OPERATIONAL SETUP
  1 lo         loopback carrier     unmanaged
  2 eth0       ether    off         unmanaged
  3 br0        bridge   routable    configured
  4 eth1       ether    enslaved    configured
  5 lxcbr0     bridge   no-carrier  unmanaged
  6 vethpivccu ether    degraded    unmanaged

6 links listed.

 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines