High cpu usage by interrupts in A20 system


Recommended Posts

I've repurposed an old TV android box with armbian, just using the Olimex based image. It works peferectly, but I'm seeing a constant constant cpu usage by kworkers:

root@lime:~# uname -r
4.19.62-sunxi
root@lime:~# ps auxk -%cpu | head
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      5740  8.8  0.0      0     0 ?        I    10:27   1:19 [kworker/1:0-eve]
root      4444  4.6  0.0      0     0 ?        I    10:17   1:10 [kworker/0:0-eve]
root         1  0.3  0.5  25848  5448 ?        Ss   10:01   0:09 /sbin/init
root       147  0.2  0.0      0     0 ?        I<   10:01   0:05 [kworker/1:1H-kb]
root       195  0.1  0.0      0     0 ?        I<   10:01   0:04 [kworker/0:2H-kb]
root       205  0.1  0.3  24544  4052 ?        Ss   10:01   0:02 /lib/systemd/systemd-journald
root         2  0.0  0.0      0     0 ?        S    10:01   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        I<   10:01   0:00 [rcu_gp]
root         4  0.0  0.0      0     0 ?        I<   10:01   0:00 [rcu_par_gp]

 

Tried to debug looking into sys dumps:

root@lime:~# echo l > /proc/sysrq-trigger
root@lime:~# dmesg | tail -100
...
[ 2489.669676] sysrq: SysRq : Show backtrace of all active CPUs
[ 2489.675658] NMI backtrace for cpu 1
[ 2489.675676] CPU: 1 PID: 2958 Comm: bash Not tainted 4.19.62-sunxi #5.92
[ 2489.675683] Hardware name: Allwinner sun7i (A20) Family
[ 2489.675731] [<c010d74d>] (unwind_backtrace) from [<c010a2f1>] (show_stack+0x11/0x14)
[ 2489.675757] [<c010a2f1>] (show_stack) from [<c08fc121>] (dump_stack+0x69/0x78)
[ 2489.675782] [<c08fc121>] (dump_stack) from [<c09003a3>] (nmi_cpu_backtrace+0x8f/0x90)
[ 2489.675804] [<c09003a3>] (nmi_cpu_backtrace) from [<c0900453>] (nmi_trigger_cpumask_backtrace+0xaf/0xe0)
[ 2489.675828] [<c0900453>] (nmi_trigger_cpumask_backtrace) from [<c060c311>] (__handle_sysrq+0x7d/0x114)
[ 2489.675850] [<c060c311>] (__handle_sysrq) from [<c060c75d>] (write_sysrq_trigger+0x49/0x54)
[ 2489.675873] [<c060c75d>] (write_sysrq_trigger) from [<c0290b65>] (proc_reg_write+0x45/0x64)
[ 2489.675900] [<c0290b65>] (proc_reg_write) from [<c023ef13>] (vfs_write+0x77/0x144)
[ 2489.675924] [<c023ef13>] (vfs_write) from [<c023f105>] (ksys_write+0x49/0x98)
[ 2489.675945] [<c023f105>] (ksys_write) from [<c0101001>] (ret_fast_syscall+0x1/0x62)
[ 2489.675954] Exception stack(0xecac3fa8 to 0xecac3ff0)
[ 2489.675970] 3fa0:                   00000002 01131408 00000001 01131408 00000002 00000000
[ 2489.675987] 3fc0: 00000002 01131408 b6eb0d60 00000004 01131408 00000002 00000000 00000000
[ 2489.675998] 3fe0: 00000000 beb107fc b6e131bb b6e4fcf6
[ 2489.676010] Sending NMI from CPU 1 to CPUs 0:
[ 2489.676300] NMI backtrace for cpu 0
[ 2489.676308] CPU: 0 PID: 205 Comm: systemd-journal Not tainted 4.19.62-sunxi #5.92
[ 2489.676313] Hardware name: Allwinner sun7i (A20) Family
[ 2489.676317] PC is at fput+0x0/0x94
[ 2489.676320] LR is at path_openat+0x269/0xe0c
[ 2489.676326] pc : [<c023fed0>]    lr : [<c024985d>]    psr: a00f0033
[ 2489.676330] sp : ed84fe40  ip : 014e8905  fp : c0e04d48
[ 2489.676335] r10: eb92a240  r9 : ffffe000  r8 : fffff000
[ 2489.676340] r7 : ed84ff70  r6 : 08010800  r5 : fffffffe  r4 : ed84fec0
[ 2489.676345] r3 : 00000000  r2 : 00000011  r1 : 00000000  r0 : eb92a240
[ 2489.676351] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA Thumb  Segment none
[ 2489.676356] Control: 50c5387d  Table: 6d8dc06a  DAC: 00000051
[ 2489.676362] CPU: 0 PID: 205 Comm: systemd-journal Not tainted 4.19.62-sunxi #5.92
[ 2489.676367] Hardware name: Allwinner sun7i (A20) Family
[ 2489.676372] [<c010d74d>] (unwind_backtrace) from [<c010a2f1>] (show_stack+0x11/0x14)
[ 2489.676378] [<c010a2f1>] (show_stack) from [<c08fc121>] (dump_stack+0x69/0x78)
[ 2489.676384] [<c08fc121>] (dump_stack) from [<c090036d>] (nmi_cpu_backtrace+0x59/0x90)
[ 2489.676390] [<c090036d>] (nmi_cpu_backtrace) from [<c010c5d9>] (handle_IPI+0x85/0x2c0)
[ 2489.676395] [<c010c5d9>] (handle_IPI) from [<c05c9c7f>] (gic_handle_irq+0x67/0x68)
[ 2489.676401] [<c05c9c7f>] (gic_handle_irq) from [<c0101a65>] (__irq_svc+0x65/0x94)
[ 2489.676406] Exception stack(0xed84fdf0 to 0xed84fe38)
[ 2489.676412] fde0:                                     eb92a240 00000000 00000011 00000000
[ 2489.676418] fe00: ed84fec0 fffffffe 08010800 ed84ff70 fffff000 ffffe000 eb92a240 c0e04d48
[ 2489.676424] fe20: 014e8905 ed84fe40 c024985d c023fed0 a00f0033 ffffffff
[ 2489.676429] [<c0101a65>] (__irq_svc) from [<c023fed0>] (fput+0x0/0x94)
[ 2489.676434] [<c023fed0>] (fput) from [<ed8402c0>] (0xed8402c0)

 

I suspect that something is happening with interrupts, so inspected /proc/interrupts, and have seen two very high values in "sunxi-mmc" and "Rescheduling interrupts":

root@lime:~# cat /proc/interrupts 
           CPU0       CPU1       
 18:          0          0     GICv2  29 Level     arch_timer
 19:     132351     137260     GICv2  30 Level     arch_timer
 22:          0          0     GICv2  54 Level     timer@1c20c00
 23:          0          0     GICv2 113 Level     sun5i_timer0
 24:          0          0     GICv2 152 Level     arm-pmu
 25:          0          0     GICv2 153 Level     arm-pmu
 26:          0          0     GICv2  59 Level     1c02000.dma-controller
 27:          0          0     GICv2  76 Level     1c0c000.lcd-controller
 28:          0          0     GICv2  77 Level     1c0d000.lcd-controller
 29:    1228805          0     GICv2  64 Level     sunxi-mmc
 30:          0          0     GICv2  70 Level     musb-hdrc.1.auto
 31:          0          0     GICv2  71 Level     ehci_hcd:usb1
 32:          0          0     GICv2  96 Level     ohci_hcd:usb3
 36:          0          0     GICv2  72 Level     ehci_hcd:usb2
 37:          0          0     GICv2  97 Level     ohci_hcd:usb4
 40:          0          0     GICv2  56 Level     1c20d00.rtc
 44:       2723          0     GICv2  61 Level     sun4i_gpadc_irq_chip
 45:          6          0     GICv2  33 Level     ttyS0
 46:        692          0     GICv2  39 Level     mv64xxx_i2c
 47:          4          0     GICv2  40 Level     mv64xxx_i2c
 48:          0          0     GICv2 101 Level     gp
 49:          0          0     GICv2 102 Level     gpmmu
 50:          0          0     GICv2 103 Level     pp0
 51:          0          0     GICv2 104 Level     ppmmu0
 52:          0          0     GICv2 106 Level     pp1
 53:          0          0     GICv2 107 Level     ppmmu1
 55:      16900          0     GICv2 117 Level     eth0
 62:          0          0  sunxi_pio_edge   1 Edge      1c0f000.mmc cd
 65:          1          0  sunxi_pio_edge   4 Edge      usb0-id-det
 66:          0          0  sunxi_pio_edge   5 Edge      usb0-vbus-det
 93:          0          0  sunxi-nmi   0 Level     axp20x_irq_chip
127:          0          0  axp20x_irq_chip  33 Edge      axp20x-pek-dbr
128:          0          0  axp20x_irq_chip  34 Edge      axp20x-pek-dbf
133:       2722          0  sun4i_gpadc_irq_chip   1 Edge      temp_data
134:          0          0  sun4i_gpadc_irq_chip   0 Edge      fifo_data
IPI0:          0          0  CPU wakeup interrupts
IPI1:          0          0  Timer broadcast interrupts
IPI2:      26187     512258  Rescheduling interrupts
IPI3:        312        262  Function call interrupts
IPI4:          0          0  CPU stop interrupts
IPI5:       2553       2019  IRQ work interrupts
IPI6:          0          0  completion interrupts

 

Is this normal in an A20 board or something fixable by configuration/tunning/whatever?

Link to post
Share on other sites
Armbian is a community driven open source project. Do you like to contribute your code?

7 hours ago, sucotronic said:

an old TV android box

 

7 hours ago, sucotronic said:

Is this normal in an A20 board

 

The problem (as I understand it) likely is not "an A20 board" but rather "[some random] old TV Android box."

 

7 hours ago, sucotronic said:

Is [...] fixable by configuration/tunning/whatever?

 

Maybe, maybe not.  Could be simple or complex.  But it will be up to you (and/or others) to dig and find out (as it is not Supported device).

 

What you may find (after spending some (a lot?) of time) is that there are good reasons that Armbian does not support every random board out there (and especially, TV boxes).  Or, maybe you get lucky and get it working better (in which case, please share your findings here).  It actually sounds to me like you are a little bit into a decent investigation already.  Maybe you know more than me.

 

I, personally, do not want to get involved in (potentially lengthy) investigations.  And therefore I stick (strictly) to the list of Supported Devices.  In fact, I use that as a starting point for any hardware purchases.  But maybe you like a challenge?  ;)  Good luck!

Edited by TRS-80
clarity
Link to post
Share on other sites
15 hours ago, hexdump said:

Wow, that was a very helpful answer, thanks a lot!!

With that info I can just blacklist that modules (anyway, I'm not interested in using the ADC functionality of the A20 :P) and now the system shows proper low cpu usage:

root@lime:~# cat /etc/modprobe.d/fix.conf 
blacklist sun4i_gpadc sun4i_gpadc_iio
root@lime:~# ps auxk -cpu | head
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.5  26932  5332 ?        Ss   00:07   0:07 /sbin/init
root         2  0.0  0.0      0     0 ?        S    00:07   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        I<   00:07   0:00 [rcu_gp]
root         4  0.0  0.0      0     0 ?        I<   00:07   0:00 [rcu_par_gp]
root         8  0.0  0.0      0     0 ?        I<   00:07   0:00 [mm_percpu_wq]
root         9  0.0  0.0      0     0 ?        S    00:07   0:00 [ksoftirqd/0]
root        10  0.0  0.0      0     0 ?        I    00:07   0:23 [rcu_sched]
root        11  0.0  0.0      0     0 ?        I    00:07   0:00 [rcu_bh]
root        12  0.0  0.0      0     0 ?        S    00:07   0:00 [migration/0]

 

Link to post
Share on other sites