mikeakers Posted January 29 Share Posted January 29 (edited) Hello! I'm able to reproducibly get a kernel oops whenever I start a docker container on my Helios64 The SD card I was running my Helios64 from died and so I reinstalled Armbian using the steps to install bookworm from this post. I did skip the steps related to hs400 and L2 cache as I'm currently running the system from an SD card. After the install seemed to be working well I installed zfs, imported my existing pool, and then installed docker. My previous setup was using docker's ZFS storage driver so I got that configured as well. Now, whenever I start up a docker container (ie by running "sudo docker run hello-world") I get the following oops from the kernel: [ 246.422387] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000038 [ 246.422400] Mem abort info: [ 246.422403] ESR = 0x0000000096000005 [ 246.422406] EC = 0x25: DABT (current EL), IL = 32 bits [ 246.422411] SET = 0, FnV = 0 [ 246.422414] EA = 0, S1PTW = 0 [ 246.422417] FSC = 0x05: level 1 translation fault [ 246.422421] Data abort info: [ 246.422423] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 [ 246.422426] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 246.422430] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 246.422434] user pgtable: 4k pages, 48-bit VAs, pgdp=000000000942d000 [ 246.422439] [0000000000000038] pgd=0800000016225003, p4d=0800000016225003, pud=0000000000000000 [ 246.422452] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP [ 246.423015] Modules linked in: veth xt_nat xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge rfkill sunrpc lz4hc lz4 zram binfmt_misc zfs(PO) crct10dif_ce leds_pwm panfrost spl(O) gpio_charger gpu_sched drm_shmem_helper pwm_fan snd_soc_hdmi_codec snd_soc_rockchip_i2s rockchip_vdec(C) hantro_vpu rk_crypto snd_soc_core rockchip_rga snd_compress v4l2_vp9 v4l2_h264 videobuf2_dma_sg videobuf2_dma_contig ac97_bus v4l2_mem2mem snd_pcm_dmaengine snd_pcm videobuf2_memops videobuf2_v4l2 snd_timer videodev nvmem_rockchip_efuse videobuf2_common mc snd soundcore gpio_beeper ledtrig_netdev lm75 ip_tables x_tables autofs4 efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear cdc_ncm cdc_ether usbnet r8152 realtek rockchipdrm dw_mipi_dsi fusb302 dw_hdmi tcpm analogix_dp dwmac_rk typec drm_display_helper stmmac_platform cec [ 246.423214] drm_dma_helper stmmac drm_kms_helper drm adc_keys pcs_xpcs [ 246.431612] CPU: 5 PID: 2686 Comm: dockerd Tainted: P C O 6.6.8-edge-rockchip64 #1 [ 246.432383] Hardware name: Helios64 (DT) [ 246.432731] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 246.433344] pc : get_device_state+0x28/0x88 [ledtrig_netdev] [ 246.433854] lr : netdev_trig_notify+0x13c/0x1fc [ledtrig_netdev] [ 246.434387] sp : ffff80008b2ab3e0 [ 246.434680] x29: ffff80008b2ab3e0 x28: ffff0000154d3378 x27: ffff80007a36dc30 [ 246.435313] x26: ffff800081c89a40 x25: ffff800081907008 x24: ffff80008b2ab5b8 [ 246.435944] x23: 000000000000000b x22: ffff000005efb000 x21: ffff0000154d3300 [ 246.436575] x20: ffff0000154d3378 x19: ffff0000154d3300 x18: ffff80008d293c58 [ 246.437206] x17: 000000040044ffff x16: 00500074b5503510 x15: 0000000000000000 [ 246.437837] x14: ffffffffffffffff x13: 0000000000000020 x12: 0101010101010101 [ 246.438468] x11: 7f7f7f7f7f7f7f7f x10: 000000000f5d83a0 x9 : 0000000000000020 [ 246.439099] x8 : 0101010101010101 x7 : 0000000080000000 x6 : 0000000080303663 [ 246.439729] x5 : 6336300000000000 x4 : 0000000000000000 x3 : 0000000000000000 [ 246.440360] x2 : ffff000021bdbc00 x1 : ffff000021bdbc00 x0 : 0000000000000000 [ 246.440991] Call trace: [ 246.441210] get_device_state+0x28/0x88 [ledtrig_netdev] [ 246.441683] netdev_trig_notify+0x13c/0x1fc [ledtrig_netdev] [ 246.442185] notifier_call_chain+0x74/0x144 [ 246.442561] raw_notifier_call_chain+0x18/0x24 [ 246.442956] call_netdevice_notifiers_info+0x58/0xa4 [ 246.443399] dev_change_name+0x190/0x318 [ 246.443751] do_setlink+0xb7c/0xde4 [ 246.444064] rtnl_setlink+0xf8/0x194 [ 246.444383] rtnetlink_rcv_msg+0x12c/0x398 [ 246.444748] netlink_rcv_skb+0x5c/0x128 [ 246.445089] rtnetlink_rcv+0x18/0x24 [ 246.445408] netlink_unicast+0x2e8/0x350 [ 246.445755] netlink_sendmsg+0x1d4/0x444 [ 246.446102] __sock_sendmsg+0x5c/0xac [ 246.446432] __sys_sendto+0x124/0x150 [ 246.446760] __arm64_sys_sendto+0x28/0x38 [ 246.447117] invoke_syscall+0x48/0x114 [ 246.447453] el0_svc_common.constprop.0+0x40/0xe8 [ 246.447871] do_el0_svc+0x20/0x2c [ 246.448168] el0_svc+0x40/0xf4 [ 246.448442] el0t_64_sync_handler+0x13c/0x158 [ 246.448829] el0t_64_sync+0x190/0x194 [ 246.449158] Code: f9423c20 f90047e0 d2800000 f9404e60 (f9401c01) [ 246.449695] ---[ end trace 0000000000000000 ]--- TIA for the help and for keeping this unsupported hardware alive! Edited January 29 by mikeakers Typo fix 0 Quote Link to comment Share on other sites More sharing options...
snakekick Posted January 29 Share Posted January 29 (edited) hello mikeakers, you can test you memory with for i in $(seq 1 100);do python3 -c "import pkg_resources" || break;done it is possible that bootloader was also updated during the update to bookworm and that can be a problem if this run 5-6 times without errors you try to limit the cpu speed with armbian-config for me, my system only run "stable" with 600-1200mhz Edited January 29 by snakekick 0 Quote Link to comment Share on other sites More sharing options...
mikeakers Posted January 29 Author Share Posted January 29 Memory seems good. I was able to run that command 6 times without an error 0 Quote Link to comment Share on other sites More sharing options...
BipBip1981 Posted January 30 Share Posted January 30 Hi, try: 1200mhz min freq 1200mhz max freq governor performance in armbian-config program. The best stable configuration in my helios64 0 Quote Link to comment Share on other sites More sharing options...
mikeakers Posted January 30 Author Share Posted January 30 I'll try that BipBip1981, but this seems more like a logic error than system instability. Especially since I was able to do the exact same thing with my previous OS install. The other thing I'll probably try tonight is reverting the kernel to 5.10.63-rockchip64. That was stable in my old install. 0 Quote Link to comment Share on other sites More sharing options...
BipBip1981 Posted January 30 Share Posted January 30 okok keep in touch, i am interesting by your feedback. 0 Quote Link to comment Share on other sites More sharing options...
mikeakers Posted January 31 Author Share Posted January 31 (edited) Some more findings: linux-6.6.8 still oopses at 1200 mhz... Downgrading to linux 5.15.93 does seem to fix the issue though! "sudo docker run hello-world" now works correctly without oopsing the kernel. Still stable with max clock set to 1400 mhz So far so good, I'll report back if anything goes sideways once I get all my containers running again Edited January 31 by mikeakers 0 Quote Link to comment Share on other sites More sharing options...
BipBip1981 Posted January 31 Share Posted January 31 hi, okok, from my side kernel 6.6.12 1200MHz Performance, i do podman run hello-world, no problem and my helios run from 2 days with my freeze test pattern and not freeze for moment. good night or day 0 Quote Link to comment Share on other sites More sharing options...
steki Posted February 4 Share Posted February 4 regarding my last message... just remove it from loading from /etc/modules-load.d/modules.conf leave only lm75 inside and that is it 1 Quote Link to comment Share on other sites More sharing options...
steki Posted February 5 Share Posted February 5 looks that my original message did not get trough as it was before creating account, so to make it clear what i meant is: remove from /etc/modules-load.d/modules.conf line that states "ledtrig_netdev" as that is kernel module that on our systems triggers kernel oops best would be to just follow next steps to not need rebooting, or if you plan to reboot just remove that module from loading and that is it no need for this script to run at all. rmmod ledtrig_netdev find /lib/modules/$(uname -r) -name ledtrig-netdev.ko -exec mv {} {}.backup ';' 0 Quote Link to comment Share on other sites More sharing options...
fri.K Posted March 20 Share Posted March 20 Hi, I had exactly the same issue with kernel 6.6.16-current-rockchip64 on Helios64, but commenting out ledtrig-netdev in /etc/modules-load.d/modules.conf fixed it. Thanks @steki 0 Quote Link to comment Share on other sites More sharing options...
TDCroPower Posted April 7 Share Posted April 7 (edited) ...deleted... Edited April 7 by TDCroPower 0 Quote Link to comment Share on other sites More sharing options...
Trillien Posted April 8 Share Posted April 8 For info, probably related to this issue : https://github.com/openwrt/openwrt/issues/14377 0 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.