Jump to content

NanoPi R2S: lan0 goes offline with high traffic


Recommended Posts

Posted

 

4 часа назад, Igor сказал:

... and make this happens once again? That is not going to happen. Proper rework - absolutely no budget.

In this case Armbian is just impossible to use for Nanopi R2S. Router that losing lan interface every couple of hours is pretty bad.
Maybe it's possible to add 2.14 driver as dkms package for dedicated boards instead of replacing buildin kernel driver for rockchip64? It would've fixed all problems.

Posted
22 minutes ago, stranger said:

 

In this case Armbian is just impossible to use for Nanopi R2S. Router that losing lan interface every couple of hours is pretty bad.
Maybe it's possible to add 2.14 driver as dkms package for dedicated boards instead of replacing buildin kernel driver for rockchip64? It would've fixed all problems.

 

I have been using Armbian on the R2S with an uptime of weeks without any LAN issues at all.

Look back earlier in this thread where I documented the steps required to get a stable router using Armbian

Posted
35 минут назад, JeremyA сказал:

I have been using Armbian on the R2S with an uptime of weeks without any LAN issues at all.

That's because last Armbian kernels were using mainline r8153 driver (1.11.11). Now it's replaced with 2.14 driver that causing troubles with rk3328. I'm don't know about main repository but it's already in beta. So it will start failing again when you will upgrade kernel.

Posted
1 hour ago, stranger said:

So it will start failing again when you will upgrade kernel.

 

I understand this, but we have far greater problems than this.

Posted

Hi all !

 

I faced the same issue with my Nano PI R2S : even with nightly builds, the r8152 randomly hanged and lan0 disappeared.

I built a minimal armbian buster and revert the driver to previous 1.11 version.

I also manually modified my /etc/udev/rules.d/70-rename-lan.rules with the following content:

 

SUBSYSTEM=="net", ACTION=="add", DRIVERS=="rk_gmac-dwmac", KERNEL=="eth0", NAME="wan0"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="r8152", NAME="lan0", RUN+="/usr/sbin/ethtool -K lan0 tx off rx off"

 

It manages correctly my home fiber with ~600Mbps for DL and ~300Mbps for UL.

Under heavy charge, i didn't experiment lan0 freeze for at least 1h (previously, it froze every 10 minutes).

 

I guess 2.5Gbps support is lost but:

  1. For people with 1Gbps network, it's enought.
  2. I have serious doubt regarding the ability of R2S to route 2.5Gbps traffic ;-)

 

For people interested and who do not have time/skill/machine for compilation you can download the image here (~950MB uncompressed) : https://drive.google.com/file/d/1RRvmObBDi58TRTnZN661PL0PYFRbAi0_/view?usp=sharing

 

Posted (edited)

It is seems to be not fixed?

 

armbianmonitor -u: http://ix.io/3tOi (sorry for gzipped 1.txt.gz - ix.io does not accept file size 1MBytes)

 

Jul 23 12:32:10 localhost kernel: [  116.970137] ------------[ cut here ]------------
Jul 23 12:32:10 localhost kernel: [  116.970215] NETDEV WATCHDOG: lan0 (r8152): transmit queue 0 timed out
Jul 23 12:32:10 localhost kernel: [  116.970398] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:468 dev_watchdog+0x398/0x3a0
Jul 23 12:32:10 localhost kernel: [  116.970411] Modules linked in: governor_performance iptable_mangle iptable_filter iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter bridge zram snd_soc_hdmi_codec dw_hdmi_cec dw_hdmi_i2s_audio btsdio hci_uart r8152 btqca btrtl btbcm rockchipdrm btintel rockchip_vdec(C) hantro_vpu(C) snd_soc_audio_graph_card dw_mipi_dsi dw_hdmi brcmfmac snd_soc_simple_card rockchip_rga v4l2_h264 panfrost videobuf2_dma_contig snd_soc_es8316 snd_soc_rockchip_i2s brcmutil analogix_dp snd_soc_simple_card_utils videobuf2_dma_sg gpu_sched drm_kms_helper v4l2_mem2mem videobuf2_vmalloc cec videobuf2_memops snd_soc_core videobuf2_v4l2 rc_core bluetooth
snd_pcm_dmaengine videobuf2_common drm snd_pcm cfg80211 drm_panel_orientation_quirks videodev snd_timer rfkill snd soundcore mc cpufreq_dt nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 realtek dwmac_rk stmmac_platform stmmac pcs_xpcs                                                                                              
Jul 23 12:32:10 localhost kernel: [  116.971114] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G         C        5.10.43-rockchip64 #21.05.4
Jul 23 12:32:10 localhost kernel: [  116.971125] Hardware name: Radxa ROCK Pi 4C (DT)
Jul 23 12:32:10 localhost kernel: [  116.971143] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
Jul 23 12:32:10 localhost kernel: [  116.971160] pc : dev_watchdog+0x398/0x3a0
Jul 23 12:32:10 localhost kernel: [  116.971175] lr : dev_watchdog+0x398/0x3a0
Jul 23 12:32:10 localhost kernel: [  116.971185] sp : ffff800011bebd50
Jul 23 12:32:10 localhost kernel: [  116.971197] x29: ffff800011bebd50 x28: ffff000004e22c80
Jul 23 12:32:10 localhost kernel: [  116.971224] x27: 0000000000000004 x26: 0000000000000140
Jul 23 12:32:10 localhost kernel: [  116.971250] x25: 00000000ffffffff x24: 0000000000000000
Jul 23 12:32:10 localhost kernel: [  116.971275] x23: ffff8000118b7000 x22: ffff000004c6a41c
Jul 23 12:32:10 localhost kernel: [  116.971302] x21: ffff000004c6a000 x20: ffff000004c6a4c0
Jul 23 12:32:10 localhost kernel: [  116.971328] x19: 0000000000000000 x18: ffff8000118ded90
Jul 23 12:32:10 localhost kernel: [  116.971354] x17: 0000000000000000 x16: 0000000000000000
Jul 23 12:32:10 localhost kernel: [  116.971380] x15: 0000000000000294 x14: ffff800011beba10
Jul 23 12:32:10 localhost kernel: [  116.971407] x13: 00000000ffffffea x12: ffff80001194edc8
Jul 23 12:32:10 localhost kernel: [  116.971433] x11: 0000000000000003 x10: ffff800011936d88
Jul 23 12:32:10 localhost kernel: [  116.971471] x9 : ffff800011936de0 x8 : 0000000000017fe8
Jul 23 12:32:10 localhost kernel: [  116.971497] x7 : c0000000ffffefff x6 : 0000000000000003
Jul 23 12:32:10 localhost kernel: [  116.971522] x5 : 0000000000000000 x4 : 0000000000000000
Jul 23 12:32:10 localhost kernel: [  116.971547] x3 : 0000000000000103 x2 : 0000000000000102
Jul 23 12:32:10 localhost kernel: [  116.971574] x1 : e579c067443e7f00 x0 : 0000000000000000
Jul 23 12:32:10 localhost kernel: [  116.971600] Call trace:
Jul 23 12:32:10 localhost kernel: [  116.971618]  dev_watchdog+0x398/0x3a0
Jul 23 12:32:10 localhost kernel: [  116.971639]  call_timer_fn+0x30/0x1f8
Jul 23 12:32:10 localhost kernel: [  116.971656]  run_timer_softirq+0x290/0x540
Jul 23 12:32:10 localhost kernel: [  116.971673]  efi_header_end+0x160/0x41c
Jul 23 12:32:10 localhost kernel: [  116.971692]  irq_exit+0xb8/0xd8
Jul 23 12:32:10 localhost kernel: [  116.971709]  __handle_domain_irq+0x98/0x108
Jul 23 12:32:10 localhost kernel: [  116.971729]  gic_handle_irq+0xc0/0x140
Jul 23 12:32:10 localhost kernel: [  116.971744]  el1_irq+0xc8/0x180
Jul 23 12:32:10 localhost kernel: [  116.971763]  arch_cpu_idle+0x18/0x28
Jul 23 12:32:10 localhost kernel: [  116.971779]  default_idle_call+0x44/0x1bc
Jul 23 12:32:10 localhost kernel: [  116.971796]  do_idle+0x204/0x278
Jul 23 12:32:10 localhost kernel: [  116.971810]  cpu_startup_entry+0x28/0x60
Jul 23 12:32:10 localhost kernel: [  116.971828]  rest_init+0xd4/0xe4
Jul 23 12:32:10 localhost kernel: [  116.971848]  arch_call_rest_init+0x10/0x1c
Jul 23 12:32:10 localhost kernel: [  116.971897]  start_kernel+0x818/0x850
Jul 23 12:32:10 localhost kernel: [  116.971910] ---[ end trace 9f9f86083853a762 ]---
Jul 23 12:32:10 localhost kernel: [  116.971980] r8152 6-1:1.0 lan0: Tx timeout
Jul 23 12:32:10 localhost kernel: [  116.976937] r8152 6-1:1.0 lan0: Tx status -2
Jul 23 12:32:10 localhost kernel: [  116.977263] r8152 6-1:1.0 lan0: Tx status -2
Jul 23 12:32:10 localhost kernel: [  116.977574] r8152 6-1:1.0 lan0: Tx status -2
Jul 23 12:32:10 localhost kernel: [  116.977869] r8152 6-1:1.0 lan0: Tx status -2
Jul 23 12:32:12 localhost kernel: [  119.241195] usb 6-1: reset SuperSpeed Gen 1 USB device number 2 using xhci-hcd
Jul 23 12:32:12 localhost kernel: [  119.276147] r8152 6-1:1.0 lan0: Promiscuous mode enabled
 

 

/etc/udev/rules.d/70-rename-usb1.rules:

SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="d0:37:45:2c:f3:61", ATTR{type}=="1", KERNEL=="eth*", NAME="lan0", RUN+="/usr/sbin/ethtool -K lan0 tx off rx off"

 

Night build (Linux rockpi-4c 5.10.52-rockchip64 #trunk.95 SMP PREEMPT Wed Jul 21 14:00:12 UTC 2021 aarch64 GNU/Linux) - same result

Edited by amurchick
test hight build
Posted

Hi,

I have a similar possible identical issue. One of my NanoPi R2S systems is running as my VPN gateway (for both data and Pihole DNS). On several occasions my Pihole get shut (pun intended) -- DNS stops working due to my VPN gateway being unreachable on it's LAN side. On the serial console there are no warnings, errors or messages of any kind, bar armbianmonitor.

02:09:48:  408MHz  0.04   1%   0%   0%   0%   0%   0% 50.8°C  0/5
02:09:53:  600MHz  0.04   2%   1%   0%   0%   0%   0% 51.2°C  0/5
02:09:58: 1200MHz  0.27   6%   4%   0%   1%   0%   0% 53.3°C  0/5
02:10:03:  408MHz  0.25   3%   1%   1%   0%   0%   0% 50.8°C  0/5
02:10:08:  408MHz  0.31   1%   0%   0%   0%   0%   0% 51.7°C  0/5
02:10:14:  408MHz  0.41   1%   1%   0%   0%   0%   0% 50.8°C  0/5
02:10:19:  408MHz  0.38   1%   0%   0%   0%   0%   0% 51.7°C  0/5
02:10:24:  408MHz  0.43   1%   1%   0%   0%   0%   0% 50.4°C  0/5
Time        CPU    load %cpu %sys %usr %nice %io %irq   CPU  C.St.
02:10:29: 1200MHz  0.55   5%   4%   0%   0%   0%   0% 52.1°C  0/5
02:10:34:  600MHz  0.67   1%   1%   0%   0%   0%   0% 50.8°C  0/5
02:10:39:  408MHz  0.78   1%   1%   0%   0%   0%   0% 51.2°C  0/5
02:10:45:  408MHz  0.87   2%   1%   0%   0%   0%   0% 50.4°C  0/5
02:10:50:  600MHz  0.96   1%   1%   0%   0%   0%   0% 50.8°C  0/5
02:10:55:  408MHz  1.05   1%   1%   0%   0%   0%   0% 52.5°C  0/5

The interesting thing is that the load number seems to increase at some point: where it was average around well below 0.8 before, it started to creep up to around 7.00.

07:23:16:  408MHz  7.05   1%   1%   0%   0%   0%   0% 51.2°C  0/5
07:23:21: 1200MHz  7.05   1%   1%   0%   0%   0%   0% 52.5°C  0/5
07:23:26: 1200MHz  7.04   2%   2%   0%   0%   0%   0% 54.6°C  0/5

 

After attempting to log in on the serial console, there was nothing in 'top' indicating any processes were hogging up the cores causing high load numbers at all:

Spoiler
Tasks: 129 total,   1 running, 128 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.5 us,  4.7 sy,  0.0 ni, 91.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :    978.0 total,    328.7 free,    116.5 used,    532.9 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.    766.2 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND    
  662 root      20   0    8976   3068   2656 R  15.0   0.3   0:00.07 top        
    1 root      20   0  166444   9816   7116 S   0.0   1.0   0:31.19 systemd    
    2 root      20   0       0      0      0 S   0.0   0.0   0:01.80 kthreadd   
    3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp     
    4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp 
    8 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_percpu+ 
    9 root      20   0       0      0      0 S   0.0   0.0   0:00.00 rcu_tasks+ 
   10 root      20   0       0      0      0 S   0.0   0.0   0:00.00 rcu_tasks+ 
   11 root      20   0       0      0      0 S   0.0   0.0   0:00.00 rcu_tasks+ 
   12 root      20   0       0      0      0 S   0.0   0.0   1:16.02 ksoftirqd+ 
   13 root      20   0       0      0      0 I   0.0   0.0   4:08.37 rcu_preem+ 
   14 root      rt   0       0      0      0 S   0.0   0.0   0:00.32 migration+ 
   15 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/0    
   16 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/1    
   17 root      rt   0       0      0      0 S   0.0   0.0   0:00.35 migration+ 
   18 root      20   0       0      0      0 S   0.0   0.0   1:05.48 ksoftirqd+ 
   21 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/2

 

(Note that I did have to "terminate all tasks" using <BRK>+e  to be able to login on the serial console, so the 'top' results might be useless here.)

 

Currently I have added the udev rule to the system to disable TX and RX offloading, as well as to disable the link power management for the RTL8153 device.

SUBSYSTEM!="net", GOTO="__rtl815x_end"
ACTION!="add",    GOTO="__rtl815x_end"

ENV{ID_VENDOR_ID}!="0bda",  GOTO="__rtl815x_end"
ENV{ID_MODEL_ID}!="8153",   GOTO="__rtl815x_end"

LABEL="__rtl815x_disable_lpm"
RUN+="/bin/sh -c 'echo %E{ID_VENDOR_ID}:%E{ID_MODEL_ID}:k > /sys/module/usbcore/parameters/quirks'"

LABEL="__rtl815x_disable_offloading"
RUN+="/usr/sbin/ethtool -K $name rx off tx off"

LABEL="__rtl815x_end"

(Bit modified to also apply to any other RTL8153 device, to use the assigned name instead of hardcoded 'lan0'.)

 

Spoilers below on 'blocked task state' show some links to the r8152 module and RTL8153 device.

Spoiler
[426200.919448] sysrq: Show Blocked State
[426200.919840] task:ntpd            state:D stack:    0 pid: 1378 ppid:     1 flags:0x00000001
[426200.920583] Call trace:
[426200.920817]  __switch_to+0x100/0x158
[426200.921145]  __schedule+0x268/0x758
[426200.921463]  schedule+0x40/0xf8
[426200.921751]  schedule_preempt_disabled+0x18/0x30
[426200.922167]  __mutex_lock.isra.13+0x16c/0x538
[426200.922562]  __mutex_lock_slowpath+0x14/0x20
[426200.922948]  mutex_lock+0x38/0x50
[426200.923252]  __netlink_dump_start+0x90/0x278
[426200.923638]  rtnetlink_rcv_msg+0x248/0x338
[426200.924009]  netlink_rcv_skb+0x5c/0x120
[426200.924356]  rtnetlink_rcv+0x18/0x28
[426200.924681]  netlink_unicast+0x1b8/0x278
[426200.925037]  netlink_sendmsg+0x1d4/0x3e0
[426200.925393]  sock_sendmsg+0x4c/0x58
[426200.925712]  __sys_sendto+0xd4/0x140
[426200.926037]  __arm64_sys_sendto+0x28/0x38
[426200.926403]  el0_svc_common.constprop.2+0x8c/0x190
[426200.926833]  do_el0_svc+0x24/0x90
[426200.927136]  el0_svc+0x14/0x20
[426200.927416]  el0_sync_handler+0x90/0xb8
[426200.927763]  el0_sync+0x160/0x180
[426200.928080] task:atop            state:D stack:    0 pid:22113 ppid:     1 flags:0x00000001
[426200.928821] Call trace:
[426200.929050]  __switch_to+0x100/0x158
[426200.929376]  __schedule+0x268/0x758
[426200.929693]  schedule+0x40/0xf8
[426200.929982]  rpm_resume+0x13c/0x718
[426200.930300]  rpm_resume+0x280/0x718
[426200.930616]  __pm_runtime_resume+0x3c/0x88
[426200.930987]  usb_autopm_get_interface+0x24/0x60
[426200.931407]  rtl8152_get_link_ksettings+0x40/0x390 [r8152]
[426200.931899]  ethtool_get_settings+0x68/0x1f0
[426200.932285]  dev_ethtool+0x174/0x20b0
[426200.932617]  dev_ioctl+0x1f0/0x388
[426200.932927]  sock_do_ioctl+0x104/0x2c8
[426200.933266]  sock_ioctl+0x320/0x520
[426200.933584]  __arm64_sys_ioctl+0xa8/0xe8
[426200.933941]  el0_svc_common.constprop.2+0x8c/0x190
[426200.934372]  do_el0_svc+0x24/0x90
[426200.934676]  el0_svc+0x14/0x20
[426200.934956]  el0_sync_handler+0x90/0xb8
[426200.935303]  el0_sync+0x160/0x180
[426200.935607] task:kworker/0:1     state:D stack:    0 pid: 5745 ppid:     2 flags:0x00000028
[426200.936355] Workqueue: ipv6_addrconf addrconf_verify_work
[426200.936839] Call trace:
[426200.937067]  __switch_to+0x100/0x158
[426200.937392]  __schedule+0x268/0x758
[426200.937710]  schedule+0x40/0xf8
[426200.937998]  schedule_preempt_disabled+0x18/0x30
[426200.938415]  __mutex_lock.isra.13+0x16c/0x538
[426200.938808]  __mutex_lock_slowpath+0x14/0x20
[426200.939194]  mutex_lock+0x38/0x50
[426200.939498]  rtnl_lock+0x18/0x28
[426200.939794]  addrconf_verify_work+0x10/0x28
[426200.940174]  process_one_work+0x1ec/0x4d0
[426200.940537]  worker_thread+0x48/0x478
[426200.940869]  kthread+0x140/0x150
[426200.941165]  ret_from_fork+0x10/0x34
[426200.941488] task:kworker/2:2     state:D stack:    0 pid: 9367 ppid:     2 flags:0x00000028
[426200.942233] Workqueue: pm pm_runtime_work
[426200.942595] Call trace:
[426200.942823]  __switch_to+0x100/0x158
[426200.943149]  __schedule+0x268/0x758
[426200.943467]  schedule+0x40/0xf8
[426200.943753]  schedule_timeout+0x188/0x368
[426200.944116]  msleep+0x2c/0x40
[426200.944390]  napi_disable+0x58/0x100
[426200.944720]  rtl8152_suspend+0x308/0x468 [r8152]
[426200.945136]  usb_suspend_both+0x90/0x240
[426200.945492]  usb_runtime_suspend+0x30/0x80
[426200.945862]  __rpm_callback+0x100/0x150
[426200.946211]  rpm_callback+0x5c/0x88
[426200.946527]  rpm_suspend.part.12+0xd8/0x568
[426200.946905]  pm_runtime_work+0xb4/0x2b0
[426200.947252]  process_one_work+0x1ec/0x4d0
[426200.947614]  worker_thread+0x48/0x478
[426200.947946]  kthread+0x140/0x150
[426200.948241]  ret_from_fork+0x10/0x34
[426200.948565] task:kworker/3:3     state:D stack:    0 pid:11994 ppid:     2 flags:0x00000028
[426200.949314] Workqueue: events rtl_work_func_t [r8152]
[426200.949767] Call trace:
[426200.949995]  __switch_to+0x100/0x158
[426200.950320]  __schedule+0x268/0x758
[426200.950639]  schedule+0x40/0xf8
[426200.950926]  rpm_resume+0x13c/0x718
[426200.951243]  rpm_resume+0x280/0x718
[426200.951560]  __pm_runtime_resume+0x3c/0x88
[426200.951929]  usb_autopm_get_interface+0x24/0x60
[426200.952341]  rtl_work_func_t+0x74/0x3f8 [r8152]
[426200.952750]  process_one_work+0x1ec/0x4d0
[426200.953113]  worker_thread+0x48/0x478
[426200.953446]  kthread+0x140/0x150
[426200.953741]  ret_from_fork+0x10/0x34
[426200.954069] task:ping            state:D stack:    0 pid:19888 ppid: 19885 flags:0x00000024
[426200.954811] Call trace:
[426200.955038]  __switch_to+0x100/0x158
[426200.955365]  __schedule+0x268/0x758
[426200.955682]  schedule+0x40/0xf8
[426200.955970]  schedule_preempt_disabled+0x18/0x30
[426200.956387]  __mutex_lock.isra.13+0x16c/0x538
[426200.956780]  __mutex_lock_slowpath+0x14/0x20
[426200.957167]  mutex_lock+0x38/0x50
[426200.957470]  rtnl_lock+0x18/0x28
[426200.957766]  ip6mr_sk_done+0x38/0xe8
[426200.958091]  rawv6_close+0x28/0x68
[426200.958403]  inet_release+0x50/0xa8
[426200.958720]  inet6_release+0x34/0x50
[426200.959045]  __sock_release+0x44/0xb8
[426200.959376]  sock_close+0x18/0x28
[426200.959680]  __fput+0x78/0x230
[426200.959960]  ____fput+0x10/0x20
[426200.960248]  task_work_run+0x88/0xc8
[426200.960574]  do_exit+0x370/0xab8
[426200.960869]  do_group_exit+0x44/0xa0
[426200.961193]  __wake_up_parent+0x0/0x30
[426200.961534]  el0_svc_common.constprop.2+0x8c/0x190
[426200.961965]  do_el0_svc+0x24/0x90
[426200.962267]  el0_svc+0x14/0x20
[426200.962548]  el0_sync_handler+0x90/0xb8
[426200.962895]  el0_sync+0x160/0x180
[426200.963196] task:ip              state:D stack:    0 pid:29248 ppid:  1373 flags:0x00000000
[426200.963936] Call trace:
[426200.964164]  __switch_to+0x100/0x158
[426200.964489]  __schedule+0x268/0x758
[426200.964807]  schedule+0x40/0xf8
[426200.965096]  schedule_preempt_disabled+0x18/0x30
[426200.965513]  __mutex_lock.isra.13+0x16c/0x538
[426200.965906]  __mutex_lock_slowpath+0x14/0x20
[426200.966291]  mutex_lock+0x38/0x50
[426200.966594]  rtnetlink_rcv_msg+0xe4/0x338
[426200.966956]  netlink_rcv_skb+0x5c/0x120
[426200.967303]  rtnetlink_rcv+0x18/0x28
[426200.967628]  netlink_unicast+0x1b8/0x278
[426200.967983]  netlink_sendmsg+0x1d4/0x3e0
[426200.968339]  sock_sendmsg+0x4c/0x58
[426200.968657]  ____sys_sendmsg+0x284/0x2c8
[426200.969013]  ___sys_sendmsg+0x84/0xc8
[426200.969346]  __sys_sendmsg+0x6c/0xc0
[426200.969672]  __arm64_sys_sendmsg+0x24/0x30
[426200.970043]  el0_svc_common.constprop.2+0x8c/0x190
[426200.970473]  do_el0_svc+0x24/0x90
[426200.970776]  el0_svc+0x14/0x20
[426200.971056]  el0_sync_handler+0x90/0xb8
[426200.971403]  el0_sync+0x160/0x180
[426200.971712] task:ip              state:D stack:    0 pid:29980 ppid: 29979 flags:0x00000001
[426200.972452] Call trace:
[426200.972680]  __switch_to+0x100/0x158
[426200.973006]  __schedule+0x268/0x758
[426200.973324]  schedule+0x40/0xf8
[426200.973613]  schedule_preempt_disabled+0x18/0x30
[426200.974030]  __mutex_lock.isra.13+0x16c/0x538
[426200.974422]  __mutex_lock_slowpath+0x14/0x20
[426200.974808]  mutex_lock+0x38/0x50
[426200.975111]  __netlink_dump_start+0x90/0x278
[426200.975496]  rtnetlink_rcv_msg+0x248/0x338
[426200.975866]  netlink_rcv_skb+0x5c/0x120
[426200.976214]  rtnetlink_rcv+0x18/0x28
[426200.976540]  netlink_unicast+0x1b8/0x278
[426200.976895]  netlink_sendmsg+0x1d4/0x3e0
[426200.977250]  sock_sendmsg+0x4c/0x58
[426200.977568]  __sys_sendto+0xd4/0x140
[426200.977892]  __arm64_sys_sendto+0x28/0x38
[426200.978257]  el0_svc_common.constprop.2+0x8c/0x190
[426200.978688]  do_el0_svc+0x24/0x90
[426200.978991]  el0_svc+0x14/0x20
[426200.979271]  el0_sync_handler+0x90/0xb8
[426200.979617]  el0_sync+0x160/0x180
[426200.979921] task:sudo            state:D stack:    0 pid:30561 ppid: 30516 flags:0x00000001
[426200.980662] Call trace:
[426200.980889]  __switch_to+0x100/0x158
[426200.981215]  __schedule+0x268/0x758
[426200.981532]  schedule+0x40/0xf8
[426200.981821]  schedule_preempt_disabled+0x18/0x30
[426200.982237]  __mutex_lock.isra.13+0x16c/0x538
[426200.982630]  __mutex_lock_slowpath+0x14/0x20
[426200.983015]  mutex_lock+0x38/0x50
[426200.983318]  __netlink_dump_start+0x90/0x278
[426200.983703]  rtnetlink_rcv_msg+0x248/0x338
[426200.984073]  netlink_rcv_skb+0x5c/0x120
[426200.984421]  rtnetlink_rcv+0x18/0x28
[426200.984746]  netlink_unicast+0x1b8/0x278
[426200.985102]  netlink_sendmsg+0x1d4/0x3e0
[426200.985457]  sock_sendmsg+0x4c/0x58
[426200.985774]  __sys_sendto+0xd4/0x140
[426200.986099]  __arm64_sys_sendto+0x28/0x38
[426200.986463]  el0_svc_common.constprop.2+0x8c/0x190
[426200.986893]  do_el0_svc+0x24/0x90
[426200.987195]  el0_svc+0x14/0x20
[426200.987476]  el0_sync_handler+0x90/0xb8
[426200.987823]  el0_sync+0x160/0x180

 

 

Backtrace of all active CPUs below.

Spoiler
[426121.505605] sysrq: Show backtrace of all active CPUs
[426121.506080] sysrq: CPU0:
[426121.506336] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G         C        5.10.43-rockchip64 #21.05.4
[426121.507130] Hardware name: FriendlyElec NanoPi R2S (DT)
[426121.507613] pstate: 40000005 (nZcv daif -PAN -UAO -TCO BTYPE=--)
[426121.508173] pc : arch_cpu_idle+0x18/0x28
[426121.508544] lr : arch_cpu_idle+0x10/0x28
[426121.508909] sp : ffff8000118b3e90
[426121.509222] x29: ffff8000118b3e90 x28: ffff800011510108 
[426121.509725] x27: 0000000000000000 x26: ffff8000118c3ac0 
[426121.510226] x25: 0000000000000000 x24: ffff800011307b70 
[426121.510727] x23: ffff80001157e8b8 x22: ffff8000118b9948 
[426121.511228] x21: ffff8000118ba2e8 x20: 0000000000000001 
[426121.511728] x19: ffff8000118ba208 x18: 0000000000000000 
[426121.512228] x17: 0000000000000000 x16: 0000000000000000 
[426121.512728] x15: 0000000000000000 x14: 0000000000000000 
[426121.513229] x13: 0000000000000000 x12: 0000000000000002 
[426121.513729] x11: 0000000000000000 x10: 0000000000000a40 
[426121.514229] x9 : ffff8000118b3de0 x8 : ffff8000118c4560 
[426121.514730] x7 : 0000000000003b3e x6 : 0000094d2d44750d 
[426121.515230] x5 : 00ffffffffffffff x4 : ffff80002e7c8000 
[426121.515731] x3 : 0000000000000001 x2 : 0000000000000000 
[426121.516230] x1 : 00000000778bc530 x0 : 00000000000000e0 
[426121.516732] Call trace:
[426121.516979]  arch_cpu_idle+0x18/0x28
[426121.517320]  default_idle_call+0x44/0x1bc
[426121.517699]  do_idle+0x204/0x278
[426121.518009]  cpu_startup_entry+0x24/0x60
[426121.518380]  rest_init+0xd4/0xe4
[426121.518692]  arch_call_rest_init+0x10/0x1c
[426121.519080]  start_kernel+0x818/0x850
[426121.519534] sysrq: CPU2:
[426121.519786] Call trace:
[426121.520037]  dump_backtrace+0x0/0x200
[426121.520389]  show_stack+0x18/0x28
[426121.520708]  showacpu+0x68/0x80
[426121.521015]  flush_smp_call_function_queue+0x100/0x250
[426121.521495]  generic_smp_call_function_single_interrupt+0x14/0x20
[426121.522056]  ipi_handler+0x7c/0x370
[426121.522394]  handle_percpu_devid_fasteoi_ipi+0x9c/0x1f0
[426121.522878]  generic_handle_irq+0x30/0x48
[426121.523254]  __handle_domain_irq+0x94/0x108
[426121.523650]  gic_handle_irq+0xc4/0xe0
[426121.523997]  el1_irq+0xc8/0x180
[426121.524304]  ep_item_poll.isra.24+0x68/0xe0
[426121.524701]  ep_scan_ready_list.isra.23+0xa4/0x1d0
[426121.525147]  ep_poll+0x100/0x3b8
[426121.525461]  do_epoll_wait+0xd8/0x128
[426121.525812]  __arm64_sys_epoll_pwait+0x5c/0xd8
[426121.526231]  el0_svc_common.constprop.2+0x8c/0x190
[426121.526679]  do_el0_svc+0x24/0x90
[426121.526998]  el0_svc+0x14/0x20
[426121.527295]  el0_sync_handler+0x90/0xb8
[426121.527658]  el0_sync+0x160/0x180
[426121.527974] sysrq: CPU1:
[426121.528225] Call trace:
[426121.528476]  dump_backtrace+0x0/0x200
[426121.528828]  show_stack+0x18/0x28
[426121.529157]  showacpu+0x68/0x80
[426121.529465]  flush_smp_call_function_queue+0x100/0x250
[426121.529946]  generic_smp_call_function_single_interrupt+0x14/0x20
[426121.530508]  ipi_handler+0x7c/0x370
[426121.530844]  handle_percpu_devid_fasteoi_ipi+0x9c/0x1f0
[426121.531337]  generic_handle_irq+0x30/0x48
[426121.531715]  __handle_domain_irq+0x94/0x108
[426121.532111]  gic_handle_irq+0xc4/0xe0
[426121.532460]  el1_irq+0xc8/0x180
[426121.532767]  syslog_print+0x120/0x278
[426121.533119]  do_syslog.part.27+0x33c/0x488
[426121.533506]  do_syslog+0x44/0x58
[426121.533820]  kmsg_read+0x4c/0x68
[426121.534134]  proc_reg_read+0x30/0xf8
[426121.534477]  vfs_read+0xac/0x1d0
[426121.534788]  ksys_read+0x68/0xf0
[426121.535100]  __arm64_sys_read+0x1c/0x28
[426121.535467]  el0_svc_common.constprop.2+0x8c/0x190
[426121.535915]  do_el0_svc+0x24/0x90
[426121.536236]  el0_svc+0x14/0x20
[426121.536534]  el0_sync_handler+0x90/0xb8
[426121.536897]  el0_sync+0x160/0x180

 

 

As I did spot several references to either suspend states of the RTL8153 (rtl8152_get_link_ksettings, rtl8152_suspend, __pm_runtime_resume), it might be more related to the link power management issue referenced in this thread.

 

In the meantime, as this 'hangup' occurs only perhaps once every 1-2 weeks, I will monitor the system to see if it will happen still after the LPM and RX/TX offloading tweaks.

 

If anyone is interested in some more backtrace content, I can provide.

Groetjes,

 

Posted

Just to let you know, about the ISO i shared there: 

i'm running my R2S perfectly fine with my fiber connection and a gigabit lan since i used my image.

It routes traffic over 3 VLANs, do dhcpd, bind, and iptables.

 

You can just `apt hold` the kernel stuff and update the system normally...
You just loose kernel updates.

Posted (edited)

Hi,

I still have this issue occur occasionally and was not successful in finding & building the v1.11 version of the r8152 kernel module. The symptoms included:

  1. lan0 (the USB attached NIC) sometimes disappeared.
  2. Entire broadcast domain stopped responding:
    1. helios64 reported:
      rk_gmac-dwmac fe300000.ethernet eth0: Reset adapter.

       

    2. orangepi zero reported:
      dwmac-sun8i 1c30000.ethernet eth0: Reset adapter.
    3. nanopi R2S sometimes reported messages about TX buffer/queue overflow (cannot locate this message anymore unfortunately).
    4. The NIC activity LED on one of the nanopi R2S boxen was showing a high amounts of traffic.
    5. The broadcast domain started working again, after I disconnected lan0 NIC of the nanopi R2S from local network.

These symptoms reminded me of some incident we encountered during my daytime profession; an unresponsive broadcast domain, while all hardware was still OK. Turned out that one device on that network borked out and started flooding the broadcast domain with PAUSE frames, which are broadcast by the first switch it encounters. This will in turn cause all other devices to stop transmitting data as well, leading to the entire broadcast domain to come to a halt.

 

Knowing this, I tried to disable the PAUSE options from the USB NIC on buster, but this seemed not possible:

root@nanopi0:~# ethtool -A lan0 autoneg off rx off tx off
Cannot get device pause settings: Operation not supported

 

I tried to upgrade to bullseye, and lo and behold, the PAUSE option could be (partially) disabled on the USB NIC (lan0):

root@nanopi1:~# ethtool -a lan0
Pause parameters for lan0:
Autonegotiate:  on
RX:             off
TX:             off
RX negotiated: off
TX negotiated: off

 

Currently running bullseye and will keep monitoring. Hopefully either the upgrade to bullseye (or the disabling of PAUSE frames) will prevent this issue from popping up every now and then.

 

(It had a habit of popping up, when I was away from home, making it impossible to connect to my home network from outside anymore.)

 

Perhaps this might also help someone here?

Groetjes,
 

Spoiler

Old version:

Linux nanopi0 5.10.63-rockchip64 #21.08.2 SMP PREEMPT Wed Sep 8 10:57:23 UTC 2021 aarch64 GNU/Linux
Description:    Debian GNU/Linux 10 (buster)

 

New version:

Linux nanopi1 5.15.25-rockchip64 #22.02.1 SMP PREEMPT Sun Feb 27 09:05:47 UTC 2022 aarch64 GNU/Linux
Description:    Debian GNU/Linux 11 (bullseye)

 

 

Edited by djurny

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines