Jump to content

Last stability/random reboot issue workaround - zswap and zram bug


prahal

Recommended Posts

even though I disabled DVFS :

 

$ cat /etc/default/cpufrequtils
ENABLE="true"
GOVERNOR=conservative
MAX_SPEED=1416000
MIN_SPEED=1416000


I had a few random reboot.

 

I disabled zswap and , since I get this kind of trace about it:

mars 30 22:15:29 helios64 kernel: ------------[ cut here ]------------
mars 30 22:15:29 helios64 kernel: kernel BUG at mm/zswap.c:1313!
mars 30 22:15:29 helios64 kernel: Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
mars 30 22:15:29 helios64 kernel: Modules linked in: binfmt_misc softdog veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter br_netfilter bridge governor_performance aufs zram quota_v2 quota_tree ftdi_sio r8152 usbserial snd_soc_hdmi_codec leds_pwm snd_soc_rockchip_i2s snd_soc_rockchip_pcm gpio_charger pwm_fan snd_soc_core panfrost gpu_sched snd_pcm_dmaengine snd_pcm snd_timer snd hantro_vpu(C) soundcore rockchip_vdec(C) rockchip_rga rockchip_iep v4l2_h264 fusb302 videobuf2_dma_contig sg videobuf2_vmalloc v4l2_mem2mem videobuf2_dma_sg tcpm videobuf2_memops videobuf2_v4l2 videobuf2_common typec videodev mc gpio_beeper cpufreq_dt nfsd dm_mod auth_rpcgss nfs_acl lockd grace sunrpc ledtrig_netdev lm75 ip_tables x_tables autofs4 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear realtek raid10 uas dwmac_rk stmmac_platform
mars 30 22:15:29 helios64 kernel:  md_mod stmmac pcs_xpcs adc_keys
mars 30 22:15:29 helios64 kernel: CPU: 4 PID: 3214 Comm: containerd-shim Tainted: G         C        5.15.29-rockchip64 #trunk.0010
mars 30 22:15:29 helios64 kernel: Hardware name: Helios64 (DT)
mars 30 22:15:29 helios64 kernel: pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
mars 30 22:15:29 helios64 kernel: pc : zswap_frontswap_load+0x320/0x330
mars 30 22:15:29 helios64 kernel: lr : zswap_frontswap_load+0x218/0x330
mars 30 22:15:29 helios64 kernel: sp : ffff80000e363b10
mars 30 22:15:29 helios64 kernel: x29: ffff80000e363b10 x28: 000000000000094b x27: ffff0000f5526080
mars 30 22:15:29 helios64 kernel: x26: ffff800009b3d700 x25: 0000000000002690 x24: 00000000ffffffea
mars 30 22:15:29 helios64 kernel: x23: ffff00000f2e9980 x22: ffff00000f2e9988 x21: ffff8000094f49c8
mars 30 22:15:29 helios64 kernel: x20: fffffbffeffcb170 x19: ffff00003319e4d0 x18: 00000000000000ff
mars 30 22:15:29 helios64 kernel: x17: 00000000ffffffff x16: 0000000000000001 x15: 5165ca840ace4dcb
mars 30 22:15:29 helios64 kernel: x14: ffff80000b7e257e x13: ffff80000b7e257e x12: 0000000000000001
mars 30 22:15:29 helios64 kernel: x11: ffff80000b08d6ad x10: 000000000000000b x9 : 0000000000000001
mars 30 22:15:29 helios64 kernel: x8 : 0000000000000209 x7 : ffff80000e363a70 x6 : 0000000000000001
mars 30 22:15:29 helios64 kernel: x5 : 0000000000000001 x4 : 0000000000000030 x3 : 0000000000000001
mars 30 22:15:29 helios64 kernel: x2 : 0000000000000000 x1 : ffff000004f50e80 x0 : 0000000100000000
mars 30 22:15:29 helios64 kernel: Call trace:
mars 30 22:15:29 helios64 kernel:  zswap_frontswap_load+0x320/0x330
mars 30 22:15:29 helios64 kernel:  __frontswap_load+0x88/0x168
mars 30 22:15:29 helios64 kernel:  swap_readpage+0x1a8/0x3c0
mars 30 22:15:29 helios64 kernel:  do_swap_page+0x4e0/0x6c8
mars 30 22:15:29 helios64 kernel:  __handle_mm_fault+0x5e4/0xdd0
mars 30 22:15:29 helios64 kernel:  handle_mm_fault+0xe8/0x278
mars 30 22:15:29 helios64 kernel:  do_page_fault+0x1e0/0x448
mars 30 22:15:29 helios64 kernel:  do_translation_fault+0x58/0x68
mars 30 22:15:29 helios64 kernel:  do_mem_abort+0x40/0xb0
mars 30 22:15:29 helios64 kernel:  el0_da+0x24/0x58
mars 30 22:15:29 helios64 kernel:  el0t_64_sync_handler+0x68/0xb8
mars 30 22:15:29 helios64 kernel:  el0t_64_sync+0x180/0x184
mars 30 22:15:29 helios64 kernel: Code: 12800174 a9446bf9 17ffffc4 d4210000 (d4210000) 
mars 30 22:15:29 helios64 kernel: ---[ end trace 240751e909ea5fbc ]---

like https://bugzilla.kernel.org/show_bug.cgi?id=192571 .

 

I was able to reproduce the above a few times with:

stress -c 1 -m 16 --vm-keep

but am unable today.

 

As it seems zswap is useless on zram a swap, maybe it should be turned off either way. I disabled it by adding:

extraargs=zswap.enabled=0

to /boot/armbianEnv.txt

 

 

Link to comment
Share on other sites

It turns out that zram could not be made big enough for both a big jellyfin collection and borg so zswap is the answer for me (without zram though due to above issue).

As zram with a real swap file or partition is a bad idea due to LRU issue (once zram is filled swap of new content will only happen with the slower swap file). And having both zram and zswap looks weird (both will try to allocate RAM zram for its metadata and compressed items and zswap for its cache so more RAM lost for no added benefit.

 

To sum up if swap requirements increase above what zram can offer (175% RAM even with 3:1 compression lzo compression , zstd being a little too cpu intensive for the helios64) turn to zswap with a quite big swap file or partition (12GiB) and disable zram.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines