system crashes after sync command


Recommended Posts

Hi.

 

My helios64 always crashes with this method

 

[environment]

 1TB m.2 sata ssd for system

 16TB seagate exos on data drive

 18TB WD Ultrastar DC550 on data drive

 Linux helios64 5.9.14-rockchip64 #20.11.4 SMP PREEMPT Tue Dec 15 08:52:20 CET 2020 aarch64 aarch64 aarch64 GNU/Linux

 

 about 620,000 files on data drive.

 

 

[method1]

 #sync

 #sysctl -w vm.drop.caches = 3

 #ls -CFR [datadir] > [tmp dir]

 #ls -1lFR [datadir] > [tmp dir]

 #zip -j [tmpdir] filelist.zip

 #sync >>>>>>> Crash (blinking Red LED)

 

[method2]

 (First step) via samba , on Windows , see property of all 620,000 files.

 (Second step) on SSH "sysctl -w vm.drop_caches = 3" >>>>>> Crash(blinking Red LED)

 

 

Link to post
Share on other sites
Armbian is a community driven open source project. Do you like to contribute your code?

Hi, could try to the following tweak and redo the same methods that triggers the crash :

 

Run armbian-config, go to -> System -> CPU

 

And set:

Minimum CPU speed = 1200000

Maximum CPU speed = 1200000

CPU governor = performance

 

This will help us understand if instability is still due to DVFS.

Link to post
Share on other sites
Posted (edited)

@gprovost Thanks.

Already tested , but nothing gets better.

 

Tested (Crash) : fix cpufreq Max = 1200000 Min = 1200000 governor = performance

Tested (Crash) : fix cpufreq Max = 1008000 Min = 1008000 governor = powersave

Tested (Crash) : All ZRAM disabled ( /etc/default/armbian-zram-config )

 

I took kernel via serial console. (dmesg -E)

"sysctl -w vm.drop_caches=3" ,  cause kernel panic Immediately.

 

root@helios64:~# 
[50657.866770] Insufficient stack space to handle exception!
[50657.866777] ESR: 0x96000047 -- DABT (current EL)
[50657.867667] FAR: 0xffff800011d27fd0
[50657.867975] Task stack:     [0xffff800011d28000..0xffff800011d2c000]
[50657.868531] IRQ stack:      [0xffff800011af8000..0xffff800011afc000]
[50657.869086] Overflow stack: [0xffff0000f77922b0..0xffff0000f77932b0]
[50657.869645] CPU: 4 PID: 30 Comm: ksoftirqd/4 Tainted: G         C        5.9.14-rockchip64 #20.11.4
[50657.870435] Hardware name: Helios64 (DT)
[50657.870781] pstate: 40000085 (nZcv daIf -PAN -UAO BTYPE=--)
[50657.871278] pc : kfree+0x4/0x498
[50657.871563] lr : __free_slab+0xac/0x1a8
[50657.871899] sp : ffff800011d28060
[50657.872191] x29: ffff800011d28060 x28: fffffe00013ea840 
[50657.872658] x27: 0000000000100001 x26: ffff000057aa1400 
[50657.873124] x25: ffff0000f6c01600 x24: fffffffffffff000 
[50657.873590] x23: 0000000000000000 x22: 0000000000000000 
[50657.874055] x21: 0000000000000001 x20: ffff0000f6c01600 
[50657.874521] x19: fffffe00013ea840 x18: 0000000000000028 
[50657.874986] x17: 0000000000000000 x16: 0000000000000000 
[50657.875451] x15: ffff0000e7455a78 x14: 00000033b5e86238 
[50657.875917] x13: 000000000000034e x12: 000000000000031e 
[50657.876382] x11: 0000000000000000 x10: 0000000000000001 
[50657.876848] x9 : 00000000f0000000 x8 : 0000000000100000 
[50657.877313] x7 : ffff0000f6c00ba0 x6 : 0000000000000000 
[50657.877778] x5 : 0000000000000000 x4 : 0000000000005df0 
[50657.878243] x3 : 0000000000000010 x2 : ffff0000f6f59d00 
[50657.878708] x1 : 1ffff00000000000 x0 : ffff000057a9fc00 
[50657.879178] Kernel panic - not syncing: kernel stack overflow
[50657.879684] CPU: 4 PID: 30 Comm: ksoftirqd/4 Tainted: G         C        5.9.14-rockchip64 #20.11.4
[50657.880473] Hardware name: Helios64 (DT)
[50657.880819] Call trace:
[50657.881041]  dump_backtrace+0x0/0x200
[50657.881365]  show_stack+0x18/0x28
[50657.881660]  dump_stack+0xc0/0x11c
[50657.881963]  panic+0x164/0x364
[50657.882236]  nmi_panic+0x64/0x98
[50657.882520]  handle_bad_stack+0x11c/0x148
[50657.882873]  __bad_stack+0x90/0x94
[50657.883175]  kfree+0x4/0x498
[50657.883429]  discard_slab+0x70/0xa0
[50657.883738]  __slab_free+0x3a0/0x3e8
[50657.884052]  kfree+0x484/0x498
[50657.884320]  __free_slab+0xac/0x1a8
[50657.884628]  discard_slab+0x70/0xa0
[50657.884936]  __slab_free+0x3a0/0x3e8
[50657.885250]  kfree+0x484/0x498
[50657.885521]  __free_slab+0xac/0x1a8
[50657.885827]  discard_slab+0x70/0xa0
[50657.886134]  __slab_free+0x3a0/0x3e8
[50657.886448]  kfree+0x484/0x498
[50657.886718]  __free_slab+0xac/0x1a8
[50657.887026]  discard_slab+0x70/0xa0
[50657.887334]  __slab_free+0x3a0/0x3e8
[50657.887648]  kfree+0x484/0x498
[50657.887918]  __free_slab+0xac/0x1a8
[50657.888225]  discard_slab+0x70/0xa0
[50657.888531]  __slab_free+0x3a0/0x3e8
[50657.888846]  kfree+0x484/0x498
[50657.889116]  __free_slab+0xac/0x1a8
[50657.889424]  discard_slab+0x70/0xa0
[50657.889732]  __slab_free+0x3a0/0x3e8
[50657.890047]  kfree+0x484/0x498
[50657.890317]  __free_slab+0xac/0x1a8
[50657.890625]  discard_slab+0x70/0xa0
[50657.890933]  __slab_free+0x3a0/0x3e8
[50657.891247]  kfree+0x484/0x498
[50657.891517]  __free_slab+0xac/0x1a8
[50657.891826]  discard_slab+0x70/0xa0
[50657.892132]  __slab_free+0x3a0/0x3e8
[50657.892446]  kfree+0x484/0x498
[50657.892716]  __free_slab+0xac/0x1a8
[50657.893025]  discard_slab+0x70/0xa0
[50657.893331]  __slab_free+0x3a0/0x3e8
[50657.893646]  kfree+0x484/0x498
[50657.893916]  __free_slab+0xac/0x1a8
[50657.894222]  discard_slab+0x70/0xa0
[50657.894529]  __slab_free+0x3a0/0x3e8
[50657.894843]  kfree+0x484/0x498
[50657.895113]  __free_slab+0xac/0x1a8
[50657.895420]  discard_slab+0x70/0xa0
[50657.895728]  __slab_free+0x3a0/0x3e8
[50657.896043]  kfree+0x484/0x498
[50657.896311]  __free_slab+0xac/0x1a8
[50657.896618]  discard_slab+0x70/0xa0
[50657.896924]  __slab_free+0x3a0/0x3e8
[50657.897239]  kfree+0x484/0x498
[50657.897509]  __free_slab+0xac/0x1a8
[50657.897815]  discard_slab+0x70/0xa0
[50657.898122]  __slab_free+0x3a0/0x3e8
[50657.898437]  kfree+0x484/0x498
[50657.898707]  __free_slab+0xac/0x1a8
[50657.899013]  discard_slab+0x70/0xa0
[50657.899320]  __slab_free+0x3a0/0x3e8
[50657.899635]  kfree+0x484/0x498
[50657.899905]  __free_slab+0xac/0x1a8
[50657.900212]  discard_slab+0x70/0xa0
[50657.900519]  __slab_free+0x3a0/0x3e8
[50657.900834]  kfree+0x484/0x498
[50657.901104]  __free_slab+0xac/0x1a8
[50657.901411]  discard_slab+0x70/0xa0
[50657.901717]  __slab_free+0x3a0/0x3e8
[50657.902032]  kfree+0x484/0x498
[50657.902302]  __free_slab+0xac/0x1a8
[50657.902608]  discard_slab+0x70/0xa0
[50657.902915]  __slab_free+0x3a0/0x3e8
[50657.903229]  kfree+0x484/0x498
[50657.903499]  __free_slab+0xac/0x1a8
[50657.903807]  discard_slab+0x70/0xa0
[50657.904115]  __slab_free+0x3a0/0x3e8
[50657.904429]  kfree+0x484/0x498
[50657.904699]  __free_slab+0xac/0x1a8
[50657.905008]  discard_slab+0x70/0xa0
[50657.905314]  __slab_free+0x3a0/0x3e8
[50657.905628]  kfree+0x484/0x498
[50657.905899]  __free_slab+0xac/0x1a8
[50657.906206]  discard_slab+0x70/0xa0
[50657.906513]  __slab_free+0x3a0/0x3e8
[50657.906827]  kfree+0x484/0x498
[50657.907096]  __free_slab+0xac/0x1a8
[50657.907402]  discard_slab+0x70/0xa0
[50657.907709]  __slab_free+0x3a0/0x3e8
[50657.908023]  kfree+0x484/0x498
[50657.908293]  __free_slab+0xac/0x1a8
[50657.908600]  discard_slab+0x70/0xa0
[50657.908908]  __slab_free+0x3a0/0x3e8
[50657.909223]  kfree+0x484/0x498
[50657.909493]  __free_slab+0xac/0x1a8
[50657.909802]  discard_slab+0x70/0xa0
[50657.910110]  __slab_free+0x3a0/0x3e8
[50657.910424]  kfree+0x484/0x498
[50657.910694]  __free_slab+0xac/0x1a8
[50657.911000]  discard_slab+0x70/0xa0
[50657.911308]  __slab_free+0x3a0/0x3e8
[50657.911622]  kfree+0x484/0x498
[50657.911892]  __free_slab+0xac/0x1a8
[50657.912200]  discard_slab+0x70/0xa0
[50657.912506]  __slab_free+0x3a0/0x3e8
[50657.912821]  kfree+0x484/0x498
[50657.913091]  __free_slab+0xac/0x1a8
[50657.913397]  discard_slab+0x70/0xa0
[50657.913704]  __slab_free+0x3a0/0x3e8
[50657.914018]  kfree+0x484/0x498
[50657.914288]  __free_slab+0xac/0x1a8
[50657.914594]  discard_slab+0x70/0xa0
[50657.914901]  __slab_free+0x3a0/0x3e8
[50657.915215]  kfree+0x484/0x498
[50657.915486]  __free_slab+0xac/0x1a8
[50657.915792]  discard_slab+0x70/0xa0
[50657.916099]  __slab_free+0x3a0/0x3e8
[50657.916413]  kfree+0x484/0x498
[50657.916684]  __free_slab+0xac/0x1a8
[50657.916991]  discard_slab+0x70/0xa0
[50657.917299]  __slab_free+0x3a0/0x3e8
[50657.917614]  kfree+0x484/0x498
[50657.917884]  __free_slab+0xac/0x1a8
[50657.918192]  discard_slab+0x70/0xa0
[50657.918498]  __slab_free+0x3a0/0x3e8
[50657.918812]  kfree+0x484/0x498
[50657.919081]  __free_slab+0xac/0x1a8
[50657.919388]  discard_slab+0x70/0xa0
[50657.919696]  __slab_free+0x3a0/0x3e8
[50657.920010]  kfree+0x484/0x498
[50657.920280]  __free_slab+0xac/0x1a8
[50657.920587]  discard_slab+0x70/0xa0
[50657.920893]  __slab_free+0x3a0/0x3e8
[50657.921208]  kfree+0x484/0x498
[50657.921478]  __free_slab+0xac/0x1a8
[50657.921784]  discard_slab+0x70/0xa0
[50657.922092]  __slab_free+0x3a0/0x3e8
[50657.922406]  kfree+0x484/0x498
[50657.922676]  __free_slab+0xac/0x1a8
[50657.922983]  discard_slab+0x70/0xa0
[50657.923291]  __slab_free+0x3a0/0x3e8
[50657.923605]  kfree+0x484/0x498
[50657.923875]  __free_slab+0xac/0x1a8
[50657.924182]  discard_slab+0x70/0xa0
[50657.924489]  unfreeze_partials.isra.96+0x1d0/0x250
[50657.924910]  put_cpu_partial+0x18c/0x238
[50657.925254]  __slab_free+0x2a4/0x3e8
[50657.925569]  kfree+0x484/0x498
[50657.925841]  __d_free_external+0x20/0x40
[50657.926189]  rcu_core+0x23c/0x858
[50657.926482]  rcu_core_si+0x10/0x20
[50657.926781]  efi_header_end+0x160/0x3f4
[50657.927120]  run_ksoftirqd+0x4c/0x60
[50657.927437]  smpboot_thread_fn+0x200/0x238
[50657.927799]  kthread+0x140/0x150
[50657.928085]  ret_from_fork+0x10/0x34
[50657.928406] SMP: stopping secondary CPUs
[50657.928767] Kernel Offset: disabled
[50657.929076] CPU features: 0x0240022,2000200c
[50657.929451] Memory Limit: none
[50657.929730] ---[ end Kernel panic - not syncing: kernel stack overflow ]---

 

 

 

 

Edited by silver0480
Link to post
Share on other sites

@gprovost tested some other kernels (via armbian-config).

Looks stable with legacy kernel.

 

Crashed : 5.10.0-rc7-rockchip64
Crashed : 5.9.14-rockchip64 (default kernel)
Crashed : 5.9.10-rockchip64

 

PASSED : 4.4.213-rk3399

 

 

I will operate with a legacy kernel for the time being,

but LAN LED can't blink with legacy kernel .......... so I want to use 5.x.

Link to post
Share on other sites

@gprovost @aprayoga

 

Some extend test. same method.

 

[environment]

 armbian focal 20.11.6

 data dir :  18TB HDD btrfs + 16TB HDD btrfs (both single volume)

 

[method]

 # sync ; sync ; sync ; sysctl -w vm.drop_caches=3

 # ls -CFR [datadir] > [tmp dir]  ( about 600,000 files )

 # ls -1lFR [datadir] > [tmp dir] ( about 600,000 files )

 # zip -j [tmpdir] filelist.zip

 # sync ; sync ; sync

 # sysctl -w vm.drop_caches=3  >>>>>>> Kernel Panic (blinking Red LED)

 

[Result]

 Crashed : 5.10.0-rc7-rockchip64
 Crashed : 5.9.14-rockchip64 (default kernel)
 Crashed : 5.9.10-rockchip64

 PASSED but not suitable : 4.4.123-rk3399 ( ubuntu focal )  After few hours , system not responsed.

 PASSED : 5.8.14-rockchip64  (new)

 

(FYI) 

 PASSED :  5.8.10.1011-raspi ( ubuntu 20.10 on Raspberry PI 4B , same HDD connected via USB3 )

 

 

I guess some param of kernel is wrong about slab cache,,,,?

 

 

 

Link to post
Share on other sites