Jump to content

[Armbian build PR] - rockchip: workaround stmmac ethernet lock contemption


Recommended Posts

Posted

Description

stmmac ethernet driver is in use on all rockchip SoCs; its statistics system has been revised on kernel 6.6, but something is not working correctly right now on 32 bit platforms and a lock contemption happens very often when an userland process tries to read the statistics of the ethernet device.

One of the most common victim is vnstat daemon, which often gets stuck hogging a cpu core and skyrocketing the average load. This also makes the whole kernel angry, RCU complains, processes hang, and so on...

The patch provides a horrible workaround (remove the mutual exclusion in the reader) to fix the problem temporarely. This may break ethernet statistics that can supply freaky numbers, but at least fixes the problem for the time being.

This is the kernel dump from the hung task:

[  696.614056] Sending NMI from CPU 0 to CPUs 2:
[  696.614071] NMI backtrace for cpu 2
[  696.614079] CPU: 2 PID: 24 Comm: migration/2 Tainted: G         C         6.6.12-current-rockchip #25
[  696.614088] Hardware name: Generic DT based system
[  696.614092] Stopper: multi_cpu_stop+0x0/0x128 <- stop_machine_cpuslocked+0x118/0x180
[  696.614113] PC is at rcu_momentary_dyntick_idle+0x44/0xc4
[  696.614122] LR is at multi_cpu_stop+0xe4/0x128
[  696.614130] pc : [<b0197d74>]    lr : [<b01e03b4>]    psr: 60070113
[  696.614135] sp : f0889ed0  ip : 00000000  fp : f0889edc
[  696.614140] r10: 00000001  r9 : 00000000  r8 : 00000000
[  696.614143] r7 : b1505058  r6 : f093dddc  r5 : 00000001  r4 : f093ddf0
[  696.614148] r3 : ee6b7598  r2 : 2aaeeb3c  r1 : 00000000  r0 : 3d234000
[  696.614153] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[  696.614160] Control: 10c5387d  Table: 6f38806a  DAC: 00000051
[  696.614163] Backtrace: 
[  696.614167]  rcu_momentary_dyntick_idle from multi_cpu_stop+0xe4/0x128
[  696.614181]  multi_cpu_stop from cpu_stopper_thread+0x78/0x148
[  696.614200]  r10:f093ddf4 r9:ee6b606c r8:ee6b6074 r7:f093dddc r6:b01e02d0 r5:b9322b00
[  696.614204]  r4:ee6b6068
[  696.614207]  cpu_stopper_thread from smpboot_thread_fn+0xc0/0x15c
[  696.614223]  r10:00000000 r9:00000002 r8:b15dd06c r7:00000001 r6:b9322b00 r5:b927dcc0
[  696.614228]  r4:00000000
[  696.614231]  smpboot_thread_fn from kthread+0xe8/0x104
[  696.614246]  r10:00000000 r9:f0821d8c r8:b927df00 r7:b927dcc0 r6:b014e190 r5:b9322b00
[  696.614250]  r4:b927de00 r3:00000000
[  696.614254]  kthread from ret_from_fork+0x14/0x28
[  696.614263] Exception stack(0xf0889fb0 to 0xf0889ff8)
[  696.614269] 9fa0:                                     00000000 00000000 00000000 00000000
[  696.614277] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  696.614283] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  696.614292]  r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:b0149778 r4:b927de00
[  696.615072] CPU: 0 PID: 1258 Comm: vnstatd Tainted: G         C         6.6.12-current-rockchip #25
[  696.615084] Hardware name: Generic DT based system
[  696.615091] PC is at stmmac_get_stats64+0x38/0x1a0
[  696.615104] LR is at 0x1
[  696.615115] pc : [<b098c188>]    lr : [<00000001>]    psr: 20000013
[  696.615123] sp : f2d6dbf4  ip : f2d6dc20  fp : f2d6dc1c
[  696.615130] r10: 01000001  r9 : b96a0000  r8 : b633e4c8
[  696.615138] r7 : 00000001  r6 : 00000000  r5 : b098c150  r4 : b96a3000
[  696.615146] r3 : 000370b7  r2 : b96a2e48  r1 : f2d6dcc8  r0 : b96a0000
[  696.615155] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[  696.615164] Control: 10c5387d  Table: 6f08006a  DAC: 00000051
[  696.615170] Backtrace: 
[  696.615180]  stmmac_get_stats64 from dev_get_stats+0x44/0x168
[  696.615203]  r10:01000001 r9:b96a0000 r8:b633e4c8 r7:b1087af8 r6:b96a0000 r5:b098c150
[  696.615210]  r4:f2d6dcc8
[  696.615216]  dev_get_stats from dev_seq_printf_stats+0x34/0x178
[  696.615241]  r9:b96a0000 r8:b633e4c8 r7:f2d6de20 r6:00000000 r5:b633e4b0 r4:b96a0000
[  696.615247]  dev_seq_printf_stats from dev_seq_show+0x18/0x34
[  696.615264]  r5:00000000 r4:b633e4b0
[  696.615271]  dev_seq_show from seq_read_iter+0x3cc/0x50c
[  696.615288]  seq_read_iter from seq_read+0x8c/0xbc
[  696.615310]  r10:f2d6df68 r9:00000400 r8:00000000 r7:00004004 r6:00000400 r5:b2f38600
[  696.615319]  r4:f2d6df68
[  696.615326]  seq_read from proc_reg_read+0xb4/0xd8
[  696.615347]  r8:01c70fa0 r7:b2f38600 r6:00000000 r5:b0343274 r4:b9448c00
[  696.615355]  proc_reg_read from vfs_read+0xb8/0x2dc
[  696.615374]  r10:b0395cb4 r9:b56ba040 r8:01c70fa0 r7:f2d6df68 r6:b2f38600 r5:00000400
[  696.615381]  r4:00000000 r3:f2d6df68
[  696.615388]  vfs_read from ksys_read+0x68/0xec
[  696.615406]  r10:00000003 r9:b56ba040 r8:b01002c8 r7:00000000 r6:00000000 r5:b2f38600
[  696.615415]  r4:b2f38600
[  696.615420]  ksys_read from sys_read+0x10/0x14
[  696.615436]  r7:00000003 r6:a6f425a0 r5:000005e8 r4:01c571a8
[  696.615443]  sys_read from __sys_trace_return+0x0/0x10
[  696.615457] Exception stack(0xf2d6dfa8 to 0xf2d6dff0)
[  696.615468] dfa0:                   01c571a8 000005e8 00000004 01c70fa0 00000400 00000000
[  696.615480] dfc0: 01c571a8 000005e8 a6f425a0 00000003 0000000a aed87604 00000000 00000000
[  696.615489] dfe0: 00000003 aed874f0 a6d339a7 a6cadb06

Important notes: this issue may affect other 32 bit platforms as well! stmmac driver is in most SoCs supported by armbian, with the curious exception of Allwinner H3. Meson8/Meson8b (S802/S805) also use it and may be affected by the same problem. 64 bit platforms are not affected, since the locking mechanism is a no-op on 64 bit archs.

Jira reference number AR-2048 - Celebrate 2^11th ticket :smile_cat:

How Has This Been Tested?

  • [x] Kernel 6.6 compiled and tested on live system
  • [x] Kernel 6.7 compiled

Checklist:

  • [x] My code follows the style guidelines of this project
  • [x] I have performed a self-review of my own code
  • [x] I have commented my code, particularly in hard-to-understand areas
  • [x] I have made corresponding changes to the documentation
  • [x] My changes generate no new warnings
  • [x] Any dependent changes have been merged and published in downstream modules

View the full article

×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines