Jump to content

NanoPi R2S: lan0 goes offline with high traffic


Recommended Posts

Posted

I'm running armbian 10.5 (kernel 5.8.6-rockchip64) on my new NanoPi R2S.
I configured it as a router: lan0 (USB ethernet) is the LAN interface, eth0 (real ethernet) is the wan interface, ip forwarding enabled.

 

On the LAN subnet I put a iperf3 client, on the WAN subnet I put a iperf3 server.

Launching iperf -c _server_ip_ on the client is ok, I can reach 830Mbps, but launching iperf with -R (reverse) makes the lan0 interface to reset/disconnect.

 

I can find the following logs in the kernel:

 

Sep  8 20:53:32 gw kernel: [  611.981696] ------------[ cut here ]------------
Sep  8 20:53:32 gw kernel: [  611.981726] NETDEV WATCHDOG: lan0 (r8152): transmit queue 0 timed out
Sep  8 20:53:32 gw kernel: [  611.981801] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x39c/0x3a8
Sep  8 20:53:32 gw kernel: [  611.981805] Modules linked in: rfkill r8152 zstd hantro_vpu(C) v4l2_h264 videobuf2_dma_contig v4l2_mem2mem videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc cpufreq_dt zram usb_f_acm u_serial g_serial libcomposite ip_tables x_tables autofs4 realtek dwmac_rk stmmac_platform stmmac mdio_xpcs gpio_syscon
Sep  8 20:53:32 gw kernel: [  611.981852] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G         C        5.8.6-rockchip64 #20.08.2
Sep  8 20:53:32 gw kernel: [  611.981855] Hardware name: FriendlyElec NanoPi R2S (DT)
Sep  8 20:53:32 gw kernel: [  611.981860] pstate: 00000005 (nzcv daif -PAN -UAO BTYPE=--)
[...]

ep  8 20:53:32 gw kernel: [  611.982020] ---[ end trace e54bbedf61757c5e ]---
Sep  8 20:53:32 gw kernel: [  611.982037] r8152 5-1:1.0 lan0: Tx timeout
Sep  8 20:53:32 gw kernel: [  612.493756] r8152 5-1:1.0 lan0: get_registers -110
Sep  8 20:53:33 gw kernel: [  613.005840] r8152 5-1:1.0 lan0: set_registers -110
Sep  8 20:53:33 gw kernel: [  613.517849] r8152 5-1:1.0 lan0: get_registers -110
Sep  8 20:53:34 gw kernel: [  614.029766] r8152 5-1:1.0 lan0: get_registers -110
Sep  8 20:53:34 gw kernel: [  614.029864] r8152 5-1:1.0 lan0: set_registers -71
Sep  8 20:53:34 gw kernel: [  614.030080] r8152 5-1:1.0 lan0: Tx status -2
Sep  8 20:53:34 gw kernel: [  614.030158] r8152 5-1:1.0 lan0: Tx status -2
Sep  8 20:53:34 gw kernel: [  614.030242] r8152 5-1:1.0 lan0: Tx status -2
Sep  8 20:53:34 gw kernel: [  614.030327] r8152 5-1:1.0 lan0: Tx status -2
Sep  8 20:53:34 gw kernel: [  614.030462] r8152 5-1:1.0 lan0: get_registers -71
Sep  8 20:53:34 gw kernel: [  614.030556] r8152 5-1:1.0 lan0: set_registers -71
Sep  8 20:53:34 gw kernel: [  614.030646] r8152 5-1:1.0 lan0: get_registers -71
Sep  8 20:53:34 gw kernel: [  614.030738] r8152 5-1:1.0 lan0: get_registers -71
Sep  8 20:53:34 gw kernel: [  614.030842] r8152 5-1:1.0 lan0: set_registers -71
Sep  8 20:53:34 gw kernel: [  614.030933] r8152 5-1:1.0 lan0: get_registers -71
Sep  8 20:53:34 gw kernel: [  614.031495] r8152 5-1:1.0 lan0: get_registers -71
Sep  8 20:53:34 gw kernel: [  614.032057] r8152 5-1:1.0 lan0: get_registers -71
Sep  8 20:53:34 gw kernel: [  614.032617] r8152 5-1:1.0 lan0: get_registers -71
Sep  8 20:53:34 gw kernel: [  614.033167] r8152 5-1:1.0 lan0: get_registers -71
Sep  8 20:53:34 gw kernel: [  614.033752] r8152 5-1:1.0 lan0: get_registers -71
Sep  8 20:53:34 gw kernel: [  614.034620] r8152 5-1:1.0 lan0: get_registers -71

[...]

Sep  8 20:53:34 gw kernel: [  614.634100] r8152 5-1:1.0 lan0: get_registers -71
Sep  8 20:53:34 gw kernel: [  614.634203] r8152 5-1:1.0 lan0: set_registers -71
Sep  8 20:53:34 gw kernel: [  614.776602] usb 5-1: reset SuperSpeed Gen 1 USB device number 2 using xhci-hcd
Sep  8 20:53:34 gw kernel: [  614.808673] r8152 5-1:1.0 lan0: DT mac addr c6:e2:3a:1f:f8:99
Sep  8 20:53:35 gw kernel: [  614.952573] usb 5-1: reset SuperSpeed Gen 1 USB device number 2 using xhci-hcd
Sep  8 20:53:35 gw kernel: [  614.984657] r8152 5-1:1.0 lan0: DT mac addr c6:e2:3a:1f:f8:99
Sep  8 20:53:35 gw kernel: [  615.835867] r8152 5-1:1.0 lan0: skb_to_sgvec fail -90
Sep  8 20:53:58 gw kernel: [  638.860994] r8152 5-1:1.0 lan0: Tx timeout
Sep  8 20:53:59 gw kernel: [  639.373068] r8152 5-1:1.0 lan0: get_registers -110
Sep  8 20:53:59 gw kernel: [  639.885132] r8152 5-1:1.0 lan0: set_registers -110
Sep  8 20:54:00 gw kernel: [  640.397198] r8152 5-1:1.0 lan0: get_registers -110
Sep  8 20:54:00 gw kernel: [  640.909109] r8152 5-1:1.0 lan0: get_registers -110
Sep  8 20:54:00 gw kernel: [  640.909219] r8152 5-1:1.0 lan0: set_registers -71
Sep  8 20:54:00 gw kernel: [  640.909435] r8152 5-1:1.0 lan0: Tx status -2
Sep  8 20:54:00 gw kernel: [  640.909519] r8152 5-1:1.0 lan0: Tx status -2
Sep  8 20:54:00 gw kernel: [  640.909576] r8152 5-1:1.0 lan0: Tx status -2
Sep  8 20:54:00 gw kernel: [  640.909672] r8152 5-1:1.0 lan0: Tx status -2

After some seconds, the interface is back running again, but when I re-launch iperf3 from the client, the interface hangs again.

 

I tried to execute

echo 0bda:8153:k > /sys/module/usbcore/parameters/quirks

as suggested here, but no improvements.

 

 

Any ideas?
Thank you.

Posted

@Igor disabling tx offload on the USB ethernet (lan0) seems to solve the problem.

ethtool -K lan0 tx off

And I can't see extra CPU load or appreciable bandwidth drop with this. Great!

 

To make it work at boot, i changed /etc/udev/rules.d/70-rename-lan.rules as follows

 

SUBSYSTEM=="net", ACTION=="add", DRIVERS=="r8152", KERNEL=="eth1", NAME="lan0", RUN+="/usr/sbin/ethtool -K lan0 tx off"

 

Thank you.

Posted
  On 9/12/2020 at 8:51 PM, giox069 said:

Great!

 

To make it work at boot, i changed /etc/udev/rules.d/70-rename-lan.rules as follows

Expand  


If you like to make a small contribution to the project and make sure your next image will have this problem fixed out of the box:


https://docs.armbian.com/Process_Contribute/
https://github.com/armbian/build/blob/master/config/sources/families/include/rockchip64_common.inc#L257

Posted (edited)

I have the same problem with my R2S. I have tried the giox069 suggestion using ethtool but lan0 is still dropping out. I have to reboot the machine to regain lan0

 

This is the output of armbianmonitor -u http://ix.io/2Gs5

 

This is the start of the kernel log where the error occurs:

 

Dec  4 15:33:48 firewall kernel: [17897.537194] ------------[ cut here ]------------
Dec  4 15:33:48 firewall kernel: [17897.537249] NETDEV WATCHDOG: lan0 (r8152): transmit queue 0 timed out
Dec  4 15:33:48 firewall kernel: [17897.537364] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x39c/0x3a8
Dec  4 15:33:48 firewall kernel: [17897.537370] Modules linked in: nfnetlink_queue nfnetlink_log bluetooth rfkill zstd r8152 hantro_vpu(C) v4l2_h264 videobuf2_dma_contig v4l2_mem2mem videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc ip6t_rt cpufreq_dt xt_tcpudp xt_state xt_conntrack nft_counter zram nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink usb_f_acm u_serial g_serial libcomposite ip_tables x_tables autofs4 realtek dwmac_rk stmmac_platform stmmac mdio_xpcs gpio_syscon
Dec  4 15:33:48 firewall kernel: [17897.537493] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G         C        5.8.17-rockchip64 #20.08.21
Dec  4 15:33:48 firewall kernel: [17897.537499] Hardware name: FriendlyElec NanoPi R2S (DT)
Dec  4 15:33:48 firewall kernel: [17897.537509] pstate: 00000005 (nzcv daif -PAN -UAO BTYPE=--)
Dec  4 15:33:48 firewall kernel: [17897.537519] pc : dev_watchdog+0x39c/0x3a8
Dec  4 15:33:48 firewall kernel: [17897.537528] lr : dev_watchdog+0x39c/0x3a8
Dec  4 15:33:48 firewall kernel: [17897.537533] sp : ffff800011abbd10
Dec  4 15:33:48 firewall kernel: [17897.537538] x29: ffff800011abbd10 x28: ffff000022a1fa80 
Dec  4 15:33:48 firewall kernel: [17897.537549] x27: 0000000000000004 x26: 0000000000000140 
Dec  4 15:33:48 firewall kernel: [17897.537559] x25: 00000000ffffffff x24: 0000000000000000 
Dec  4 15:33:48 firewall kernel: [17897.537569] x23: ffff000032d793dc x22: ffff000032d79000 
Dec  4 15:33:48 firewall kernel: [17897.537578] x21: ffff000032d79480 x20: ffff800011807000 
Dec  4 15:33:48 firewall kernel: [17897.537587] x19: 0000000000000000 x18: 0000000000000000 
Dec  4 15:33:48 firewall kernel: [17897.537597] x17: 0000000000000000 x16: 0000000000000000 
Dec  4 15:33:48 firewall kernel: [17897.537606] x15: ffff80001182e000 x14: ffff800011a1d23a 
Dec  4 15:33:48 firewall kernel: [17897.537616] x13: 0000000000000000 x12: ffff800011a1c000 
Dec  4 15:33:48 firewall kernel: [17897.537625] x11: ffff80001182e000 x10: ffff800011a1c880 
Dec  4 15:33:48 firewall kernel: [17897.537635] x9 : 0000000000000000 x8 : 0000000000000001 
Dec  4 15:33:48 firewall kernel: [17897.537644] x7 : 00000000000001b9 x6 : 0000000000000003 
Dec  4 15:33:48 firewall kernel: [17897.537653] x5 : 0000000000000000 x4 : 0000000000000000 
Dec  4 15:33:48 firewall kernel: [17897.537662] x3 : ffff80001180d020 x2 : 0000000000000103 
Dec  4 15:33:48 firewall kernel: [17897.537671] x1 : fdf502529fc7a700 x0 : 0000000000000000 
Dec  4 15:33:48 firewall kernel: [17897.537681] Call trace:
Dec  4 15:33:48 firewall kernel: [17897.537693]  dev_watchdog+0x39c/0x3a8
Dec  4 15:33:48 firewall kernel: [17897.537705]  call_timer_fn+0x30/0x1e0
Dec  4 15:33:48 firewall kernel: [17897.537715]  run_timer_softirq+0x1e0/0x5b0
Dec  4 15:33:48 firewall kernel: [17897.537725]  efi_header_end+0x16c/0x400
Dec  4 15:33:48 firewall kernel: [17897.537734]  irq_exit+0xc8/0xe0
Dec  4 15:33:48 firewall kernel: [17897.537744]  __handle_domain_irq+0x98/0x108
Dec  4 15:33:48 firewall kernel: [17897.537757]  gic_handle_irq+0x54/0xa8
Dec  4 15:33:48 firewall kernel: [17897.537765]  el1_irq+0xb8/0x180
Dec  4 15:33:48 firewall kernel: [17897.537776]  arch_cpu_idle+0x28/0x218
Dec  4 15:33:48 firewall kernel: [17897.537788]  default_idle_call+0x1c/0x44
Dec  4 15:33:48 firewall kernel: [17897.537797]  do_idle+0x210/0x288
Dec  4 15:33:48 firewall kernel: [17897.537806]  cpu_startup_entry+0x24/0x68
Dec  4 15:33:48 firewall kernel: [17897.537815]  rest_init+0xd8/0xe8
Dec  4 15:33:48 firewall kernel: [17897.537825]  arch_call_rest_init+0x10/0x1c
Dec  4 15:33:48 firewall kernel: [17897.537833]  start_kernel+0x7f0/0x82c
Dec  4 15:33:48 firewall kernel: [17897.537839] ---[ end trace bb9026d51506ff19 ]---

 

Edited by JeremyA
Fix link
Posted
  On 12/4/2020 at 9:11 AM, Igor said:


Try with most recent nightly builds.

Expand  

 

I'm a bit uncertain about the process. I assume you mean install the latest kernel? I've been to the Armbian R2S page https://www.armbian.com/nanopi-r2s/ but it appears I can only download a complete system image, while what I think I want to do is install a new kernel.

Where should I look for a nightly release kernel package to download?

 

Posted
  On 12/4/2020 at 9:27 AM, JeremyA said:

Where should I look for a nightly release kernel package to download?

Expand  


You can upgrade your system to nightly but I would not do that on a machine that is in production. If you are still testing things, armbian-config -> system -> nightly. Safest way is to grab another SD card and burn "nightly" image which you can find at the bottom of the download section.

Posted
  On 12/4/2020 at 9:44 AM, Igor said:


You can upgrade your system to nightly but I would not do that on a machine that is in production. If you are still testing things, armbian-config -> system -> nightly. Safest way is to grab another SD card and burn "nightly" image which you can find at the bottom of the download section.

Expand  

 

I tried the nightly path. It all loaded O.K.  but the network fault appeared within a few minutes, so not as stable as 'stable'.

 

Reversion seems to have worked and my present plan is to write a script to reboot when lan0 ceases to exist. In the past few days that's every few hours. I did see that before I set the device up as a router it stayed up indefinitely

Posted

This workaround is effective to detect the failure of the r8152 interface (lan0) and cleanly restart the system. Create a file /root/checklan.sh and make it executable.

 

#!/bin/bash
log=/root/restarts.txt
ip a | grep -q lan0
if [[ $? != 0 ]]; then
    date >> $log
    systemctl reboot
fi

 

Run this with a crontab -e entry of

 

* * * * * /root/checklan.sh

 

This will check the lan0 interface every minute. NB it will also log into syslog each minute. To prevent clutter edit /etc/rsyslog.conf to have the following two lines (insert cron into the auth line and add the cron.* line)

 

*.*;cron,auth,authpriv.none		-/var/log/syslog
cron.*				/var/log/cron.log

 

This means all cron logging goes into /var/log/cron.log and none into /var/log/syslog

 

Activate the logging changes with

 

systemctl restart rsyslog

 

Logrotate already comes configured to rotate /var/log/cron.log so no changes required

 

The file /root/restarts.txt gives an accurate record of when and how often the interface fails. My system has a failure after a few hours but the distribution is random from minutes to many hours.

Posted

There is some problem with r8152 v2.13. I tested my nanopi r2s with armbian stable, armbian trunk, openwrt, friendlywrt and arch linux: in Armbian r8152 v2.13 is always failing, in OpenWrt and FriendlyWrt with mainline v1.11.11 it works fine, in Arch Linux it works fine with 1.11.11 and failing with 2.14 just like Armbian. Friendly Elec in their last FriendlyWrt changelong for nanopi r2s noted that they reverted from 2.13 to 1.11.11 because of instability.

Posted
  On 12/5/2020 at 4:18 PM, stranger said:

There is some problem with r8152 v2.13. I tested my nanopi r2s with armbian stable, armbian trunk, openwrt, friendlywrt and arch linux: in Armbian r8152 v2.13 is always failing, in OpenWrt and FriendlyWrt with mainline v1.11.11 it works fine, in Arch Linux it works fine with 1.11.11 and failing with 2.14 just like Armbian. Friendly Elec in their last FriendlyWrt changelong for nanopi r2s noted that they reverted from 2.13 to 1.11.11 because of instability.

Expand  


So you are saying better revert to the driver that comes with a kernel? Removing https://github.com/armbian/build/blob/master/lib/compilation-prepare.sh#L201-L213

 

I think we will lose 2.5Gbps support, but that's not that critical.

Posted

I did the install of the latest armbian build and the device is now rock solid, no reboots after 18 hours - compared to between one and four hours to failure before the upgrade.

 

For the benefit of others who wish to repeat the process here are my install steps. I used a fairly powerful debian buster desktop to do the build but it still took some time, maybe half an hour including download time? I chose to only build the kernel elements by answering the prompts from the build script compile.sh

 

cd /usr/src/
git clone --depth 1 https://github.com/armbian/build
mv build build-armbian
cd build-armbian/
./compile.sh docker
cd /usr/src/build-armbian/output/debs
scp *.deb root@192.168.0.1:~
ssh root@192.168.0.1
dpkg -i armbian-firmware-full_21.02.0-trunk_all.deb linux-dtb-current-rockchip64_21.02.0-trunk_arm64.deb linux-image-current-rockchip64_21.02.0-trunk_arm64.deb linux-u-boot-current-nanopi-r2s_21.02.0-trunk_arm64.deb
systemctl reboot

 

Posted

One side effect I have noticed from the upgrade to 21.02.0 trunk is logging to /var/log/messages has ceased.

Is this expected? Or how do I re-enable the logging?

Posted
  On 12/8/2020 at 1:16 AM, JeremyA said:

Is this expected? Or how do I re-enable the logging?

Expand  


Logging is not back after reboot?
 

  On 12/7/2020 at 10:11 PM, JeremyA said:

I used a fairly powerful debian buster desktop to do the build but it still took some time, maybe half an hour including download time?

Expand  

 

Seems fine. For comparison - second recompilation is much much faster. Kernel compile + pack on Ryzen 3950x, bare metal, is below 2 min.

Posted

As an update, the target NanoPi R2S system has been up for 22 days now without issue.

 

I'm using the black metal case and my ambient temperature is 29C (local summer with no aircon). The CPU temperature is 58C running a 50Mbps uplink. I realise the CPU temperature isn't strictly relevant to armbian, but it's indicative of the performance of the R2S under Armbian in the router role.

 

From my point of view the R2S under slightly modified armbian is a very usable approach to firewall and routing.

Posted
  On 1/23/2021 at 12:03 PM, stranger said:

But you have patched only rk3399.dtsi. Will it work with rk3328?

Expand  


Good catch. I didn't notice that and my quick tests on R2S didn't crash the board. Well ... nightly builds already have those patches builds into the kernel. Enable for your board by hand / change in board DT and see if you can reproduce?

Posted
  On 1/23/2021 at 11:10 PM, Igor said:

Good catch. I didn't notice that and my quick tests on R2S didn't crash the board. Well ... nightly builds already have those patches builds into the kernel. Enable for your board by hand / change in board DT and see if you can reproduce?

Expand  

Well it's used as my main router and it's running Archlinux ARM right now so it's not not so easy to test this stuff without messing my home net for a while, but maybe I will look into it.

Btw when I previously tested driver I found out that I was getting consistent r8152 2.13+ driver fail when I was opening https://browserleaks.com/dns in firefox browser on computer connected to Nanopi R2S. Maybe it will be useful for testing.

Posted
  On 1/28/2021 at 2:28 PM, stranger said:

Yes, AUR package have it https://aur.archlinux.org/packages/linux-rockchip64/. I just need to know that this version was patched.

Expand  

 

I can't know that, but I assume they don't change anything in this repack process so, yes. But if you want to be 100% sure, open source package that comes with and check if those bits are there.

Posted

Armbian kernel works fine on ArchlinuxARM, but r8152 start failing again with 2.14 driver as expected. In current state this kernel is bad for NanoPi R2S. I will try to add rk3399's quirck as overlay and will report later.

Posted

I tested this quirk and it doesn't help. Nanopi R2S is still failing with r8152 2.14. You definitely should use mainline r8152 for this board.

 

  Reveal hidden contents

 

Posted
  On 1/29/2021 at 5:53 AM, stranger said:

You definitely should use mainline r8152 for this board.

Expand  

 

... and make this happens once again? That is not going to happen. Proper rework - absolutely no budget.

 

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines