Kernel bug (Large file transfers)

Chris Bognar · November 17, 2020

Getting system crashes and instability, including a loss of HD I/O, during very large file transfers. This happens over the network, whether using SCP, FTP, or SMB. I'm a hobbyist, and am no coder or developer, so other than Googling, I have no practical knowledge to pull on. When the problem started occuring, I opened two PuTTY SSH interfaces as root - one with htop, and another with "dmesg --follow" to see what was going on with the system, and this is what happened:

dmesg readout at file transfer crash (using SMB this time):

Spoiler


[238803.445307] ------------[ cut here ]------------
[238803.445321] kernel BUG at mm/vmscan.c:1709!
[238803.445327] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[238803.445842] Modules linked in: xt_nat xt_tcpudp veth softdog xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter br_netfilter bridge governor_performance quota_v2 quota_tree xfs r8152 snd_soc_hdmi_codec snd_soc_rockchip_i2s hantro_vpu(C) rockchip_vdec(C) snd_soc_core rockchipdrm rockchip_rga dw_mipi_dsi v4l2_h264 snd_pcm_dmaengine panfrost dw_hdmi leds_pwm pwm_fan gpio_charger videobuf2_dma_contig snd_pcm analogix_dp videobuf2_dma_sg v4l2_mem2mem gpu_sched videobuf2_vmalloc drm_kms_helper snd_timer videobuf2_memops videobuf2_v4l2 snd videobuf2_common cec soundcore rc_core videodev sg mc drm fusb30x(C) drm_panel_orientation_quirks cpufreq_dt gpio_beeper nfsd auth_rpcgss nfs_acl lockd grace lm75 sunrpc ip_tables x_tables autofs4 raid10 raid1 raid0 multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor realtek async_tx md_mod dwmac_rk stmmac_platform
[238803.445963]  stmmac mdio_xpcs adc_keys
[238803.453981] CPU: 0 PID: 202 Comm: kswapd0 Tainted: G         C        5.8.17-rockchip64 #20.08.21
[238803.454772] Hardware name: Helios64 (DT)
[238803.455132] pstate: 80000085 (Nzcv daIf -PAN -UAO BTYPE=--)
[238803.455644] pc : isolate_lru_pages+0x3b8/0x3c0
[238803.456048] lr : isolate_lru_pages+0x13c/0x3c0
[238803.456450] sp : ffff80001284b910
[238803.456753] x29: ffff80001284b920 x28: 0000000080000b90
[238803.457231] x27: 0000000000000010 x26: ffff0000e498a830
[238803.457710] x25: ffff80001284bd28 x24: 000000000000000b
[238803.458189] x23: fffffe0002b09f48 x22: 0000000080000000
[238803.458668] x21: 000000008000000b x20: 0000000000000000
[238803.459147] x19: ffff80001284b9e8 x18: 0000000000000000
[238803.459625] x17: 0000000000000000 x16: 0000000000000000
[238803.460103] x15: 0000000000000000 x14: ffff0000e789a4e8
[238803.460582] x13: 0000000000000000 x12: 0000000000000000
[238803.461061] x11: 0000000000000000 x10: 0000000000000000
[238803.461539] x9 : 0000000000000000 x8 : 0000000000000000
[238803.462018] x7 : 0000fe007f7f4008 x6 : 0000000000000030
[238803.462497] x5 : 000000000000009f x4 : 0000000000000001
[238803.462976] x3 : fffffe00009b19f4 x2 : ffff80001284bac8
[238803.463454] x1 : ffff0000e498a830 x0 : 00000000ffffffea
[238803.463935] Call trace:
[238803.464166]  isolate_lru_pages+0x3b8/0x3c0
[238803.464541]  shrink_active_list+0xb0/0x510
[238803.464915]  shrink_lruvec+0x360/0x510
[238803.465257]  shrink_node+0x184/0x668
[238803.465585]  balance_pgdat+0x308/0x5e8
[238803.465929]  kswapd+0x1dc/0x520
[238803.466220]  kthread+0x118/0x150
[238803.466520]  ret_from_fork+0x10/0x34
[238803.466853] Code: f9400820 b5fff260 942f5856 17ffff91 (d4210000)
[238803.467404] ---[ end trace 9853b1959e0be51a ]---
[238803.467824] note: kswapd0[202] exited with preempt_count 2
[238803.468804] ------------[ cut here ]------------
[238803.469249] WARNING: CPU: 0 PID: 0 at kernel/rcu/tree.c:618 rcu_eqs_enter.isra.0+0x150/0x168
[238803.470001] Modules linked in: xt_nat xt_tcpudp veth softdog xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter br_netfilter bridge governor_performance quota_v2 quota_tree xfs r8152 snd_soc_hdmi_codec snd_soc_rockchip_i2s hantro_vpu(C) rockchip_vdec(C) snd_soc_core rockchipdrm rockchip_rga dw_mipi_dsi v4l2_h264 snd_pcm_dmaengine panfrost dw_hdmi leds_pwm pwm_fan gpio_charger videobuf2_dma_contig snd_pcm analogix_dp videobuf2_dma_sg v4l2_mem2mem gpu_sched videobuf2_vmalloc drm_kms_helper snd_timer videobuf2_memops videobuf2_v4l2 snd videobuf2_common cec soundcore rc_core videodev sg mc drm fusb30x(C) drm_panel_orientation_quirks cpufreq_dt gpio_beeper nfsd auth_rpcgss nfs_acl lockd grace lm75 sunrpc ip_tables x_tables autofs4 raid10 raid1 raid0 multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor realtek async_tx md_mod dwmac_rk stmmac_platform
[238803.470118]  stmmac mdio_xpcs adc_keys
[238803.478124] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G      D  C        5.8.17-rockchip64 #20.08.21
[238803.478913] Hardware name: Helios64 (DT)
[238803.479273] pstate: 200003c5 (nzCv DAIF -PAN -UAO BTYPE=--)
[238803.479776] pc : rcu_eqs_enter.isra.0+0x150/0x168
[238803.480203] lr : rcu_eqs_enter.isra.0+0x1c/0x168
[238803.480620] sp : ffff800011803e70
[238803.480923] x29: ffff800011803e70 x28: 0000000000000118
[238803.481402] x27: ffff800011813940 x26: 0000000000000000
[238803.481881] x25: 0000000000000000 x24: ffff80001126f3c0
[238803.482360] x23: ffff800011813940 x22: ffff800011502738
[238803.482839] x21: ffff80001180a2a0 x20: ffff800011809980
[238803.483317] x19: ffff8000115047c0 x18: 000000000000000e
[238803.483796] x17: 0000000000000001 x16: 0000000000000019
[238803.484275] x15: 000000000000000c x14: 0000000000000265
[238803.484754] x13: 0000000000000003 x12: 0000000000000002
[238803.485233] x11: 0000000000000003 x10: 0000000000000a20
[238803.485712] x9 : ffff800011803de0 x8 : ffff8000118143c0
[238803.486190] x7 : 0000000000000001 x6 : 00000536817e61ec
[238803.486669] x5 : 00ffffffffffffff x4 : 00000000018c312e
[238803.487148] x3 : ffff8000114ee018 x2 : 4000000000000000
[238803.487627] x1 : 4000000000000002 x0 : ffff0000f77297c0
[238803.488107] Call trace:
[238803.488337]  rcu_eqs_enter.isra.0+0x150/0x168
[238803.488736]  rcu_idle_enter+0x10/0x20
[238803.489074]  do_idle+0x20c/0x288
[238803.489370]  cpu_startup_entry+0x28/0x68
[238803.489728]  rest_init+0xd8/0xe8
[238803.490027]  arch_call_rest_init+0x10/0x1c
[238803.490399]  start_kernel+0x7f0/0x82c
[238803.490733] ---[ end trace 9853b1959e0be51b ]---

Terminal lost connectivity at this time.

On the PuTTY htop session:

CPU frozen, a few cores max, temp in the high 50's (58 on crash).

Terminal lost connectivity at this time also.

When this happens, if the system is able to continue, no more HD activity is able to occur. If the KOBOL softlocks, I have to hard powerdown (hold the power button until it turns off).

I'm running 4 HD's in a RAID5, through OpenMediaVault, and one manually mounted HD as a spare drive. The RAID5 is XFS, the spare drive is EXT4. The RAID5 drives are NAS drives, not all the same though, and the spare drive is an old desktop drive.

Any ideas or help would be appreciated . . . Google's results are varied and inconclusive so far.

Edited November 17, 2020 by TRS-80
put long output inside spoiler

barnumbirr · November 17, 2020

I'm also running kernel 5.8.17-rockchip64 and haven't encountered any such issues. Did you start out on 5.8.17 or did you upgrade from previous versions?
Earlier versions of the kernel has severe stability issues and in such cases (had this happen to me as well) a clean install of the system helped.

Chris Bognar · November 18, 2020

I figured out my issue. I set up a 4GB swap partition on the spare drive, and the system was choking on it. To fix my problem, all I had to do was turn off the swap and remove the swap partition mount from fstab and all was good.

As I said, I'm just a hobbyist, and I set up the swap drive out of habit with linux. The system works fine without it, but I'd be interested in learning why it was Oops-ing on a swap partition. If I find the answer, I'll post it here for others to learn from.

TRS-80 · November 18, 2020

@Chris Bognar,

Thanks for reporting back.

There are quite a number of optimizations that Armbian / Kobol do to these boards, as hardware wise they are not exactly like the "normal" x86 you may be used to. I would recommend leaving them as is for the most part, unless you really know what you are doing.

Of course, the whole point of GNU/Linux is to tinker though, so... Maybe get another cheap board for such tinkering / learning, instead of your (home?) "production" NAS. Just a thought! Cheers! :beer:

ShadowDance · November 19, 2020

If you created a separate partition for swap on the raw disk (not on top of RAID), I see only one reason that it would lead to a kernel panic, and that's a bad disk. A bad disk would lead to memory corruption. Have you run a S.M.A.R.T. self-test (both short and long) on the disk that had the swap?

That said, I would expect swap to work on top of RAID as well, but it's an extra layer that might get congested during high memory pressure. ZFS, for instance, doesn't deal well with swap on top of it, there's at least one long-standing open issue about it.

Sign In

Kernel bug (Large file transfers)

Recommended Posts

Chris Bognar

Link to comment

Share on other sites

barnumbirr

Link to comment

Share on other sites

Chris Bognar

Link to comment

Share on other sites

TRS-80

Link to comment

Share on other sites

ShadowDance

Link to comment

Share on other sites

Join the conversation

Forums

My Activity Streams

Download

Store

Important Information