Jump to content

Helios64: Kernel oops when starting docker container


mikeakers

Recommended Posts

Hello! I'm able to reproducibly get a kernel oops whenever I start a docker container on my Helios64

 

The SD card I was running my Helios64 from died and so I reinstalled Armbian using the steps to install bookworm from this post. I did skip the steps related to hs400 and L2 cache as I'm currently running the system from an SD card.

 

After the install seemed to be working well I installed zfs, imported my existing pool, and then installed docker. My previous setup was using docker's ZFS storage driver so I got that configured as well.

 

Now, whenever I start up a docker container (ie by running  "sudo docker run hello-world") I get the following oops from the kernel:

 

[  246.422387] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000038
[  246.422400] Mem abort info:
[  246.422403]   ESR = 0x0000000096000005
[  246.422406]   EC = 0x25: DABT (current EL), IL = 32 bits
[  246.422411]   SET = 0, FnV = 0
[  246.422414]   EA = 0, S1PTW = 0
[  246.422417]   FSC = 0x05: level 1 translation fault
[  246.422421] Data abort info:
[  246.422423]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[  246.422426]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  246.422430]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  246.422434] user pgtable: 4k pages, 48-bit VAs, pgdp=000000000942d000
[  246.422439] [0000000000000038] pgd=0800000016225003, p4d=0800000016225003, pud=0000000000000000
[  246.422452] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
[  246.423015] Modules linked in: veth xt_nat xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge rfkill sunrpc lz4hc lz4 zram binfmt_misc zfs(PO) crct10dif_ce leds_pwm panfrost spl(O) gpio_charger gpu_sched drm_shmem_helper pwm_fan snd_soc_hdmi_codec snd_soc_rockchip_i2s rockchip_vdec(C) hantro_vpu rk_crypto snd_soc_core rockchip_rga snd_compress v4l2_vp9 v4l2_h264 videobuf2_dma_sg videobuf2_dma_contig ac97_bus v4l2_mem2mem snd_pcm_dmaengine snd_pcm videobuf2_memops videobuf2_v4l2 snd_timer videodev nvmem_rockchip_efuse videobuf2_common mc snd soundcore gpio_beeper ledtrig_netdev lm75 ip_tables x_tables autofs4 efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear cdc_ncm cdc_ether usbnet r8152 realtek rockchipdrm dw_mipi_dsi fusb302 dw_hdmi tcpm analogix_dp dwmac_rk typec drm_display_helper stmmac_platform cec
[  246.423214]  drm_dma_helper stmmac drm_kms_helper drm adc_keys pcs_xpcs
[  246.431612] CPU: 5 PID: 2686 Comm: dockerd Tainted: P         C O       6.6.8-edge-rockchip64 #1
[  246.432383] Hardware name: Helios64 (DT)
[  246.432731] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  246.433344] pc : get_device_state+0x28/0x88 [ledtrig_netdev]
[  246.433854] lr : netdev_trig_notify+0x13c/0x1fc [ledtrig_netdev]
[  246.434387] sp : ffff80008b2ab3e0
[  246.434680] x29: ffff80008b2ab3e0 x28: ffff0000154d3378 x27: ffff80007a36dc30
[  246.435313] x26: ffff800081c89a40 x25: ffff800081907008 x24: ffff80008b2ab5b8
[  246.435944] x23: 000000000000000b x22: ffff000005efb000 x21: ffff0000154d3300
[  246.436575] x20: ffff0000154d3378 x19: ffff0000154d3300 x18: ffff80008d293c58
[  246.437206] x17: 000000040044ffff x16: 00500074b5503510 x15: 0000000000000000
[  246.437837] x14: ffffffffffffffff x13: 0000000000000020 x12: 0101010101010101
[  246.438468] x11: 7f7f7f7f7f7f7f7f x10: 000000000f5d83a0 x9 : 0000000000000020
[  246.439099] x8 : 0101010101010101 x7 : 0000000080000000 x6 : 0000000080303663
[  246.439729] x5 : 6336300000000000 x4 : 0000000000000000 x3 : 0000000000000000
[  246.440360] x2 : ffff000021bdbc00 x1 : ffff000021bdbc00 x0 : 0000000000000000
[  246.440991] Call trace:
[  246.441210]  get_device_state+0x28/0x88 [ledtrig_netdev]
[  246.441683]  netdev_trig_notify+0x13c/0x1fc [ledtrig_netdev]
[  246.442185]  notifier_call_chain+0x74/0x144
[  246.442561]  raw_notifier_call_chain+0x18/0x24
[  246.442956]  call_netdevice_notifiers_info+0x58/0xa4
[  246.443399]  dev_change_name+0x190/0x318
[  246.443751]  do_setlink+0xb7c/0xde4
[  246.444064]  rtnl_setlink+0xf8/0x194
[  246.444383]  rtnetlink_rcv_msg+0x12c/0x398
[  246.444748]  netlink_rcv_skb+0x5c/0x128
[  246.445089]  rtnetlink_rcv+0x18/0x24
[  246.445408]  netlink_unicast+0x2e8/0x350
[  246.445755]  netlink_sendmsg+0x1d4/0x444
[  246.446102]  __sock_sendmsg+0x5c/0xac
[  246.446432]  __sys_sendto+0x124/0x150
[  246.446760]  __arm64_sys_sendto+0x28/0x38
[  246.447117]  invoke_syscall+0x48/0x114
[  246.447453]  el0_svc_common.constprop.0+0x40/0xe8
[  246.447871]  do_el0_svc+0x20/0x2c
[  246.448168]  el0_svc+0x40/0xf4
[  246.448442]  el0t_64_sync_handler+0x13c/0x158
[  246.448829]  el0t_64_sync+0x190/0x194
[  246.449158] Code: f9423c20 f90047e0 d2800000 f9404e60 (f9401c01)
[  246.449695] ---[ end trace 0000000000000000 ]---

 

TIA for the help and for keeping this unsupported hardware alive! 

Edited by mikeakers
Typo fix
Link to comment
Share on other sites

hello mikeakers,

you can test you memory with

for i in $(seq 1 100);do python3 -c "import pkg_resources" || break;done

it is possible that bootloader was also updated during the update to bookworm and that can be a problem

if this run 5-6 times without errors you try to limit the cpu speed with armbian-config

for me, my system only run "stable" with 600-1200mhz

Edited by snakekick
Link to comment
Share on other sites

I'll try that BipBip1981, but this seems more like a logic error than system instability. Especially since I was able to do the exact same thing with my previous OS install.

 

The other thing I'll probably try tonight is reverting the kernel to 5.10.63-rockchip64. That was stable in my old install.

Link to comment
Share on other sites

Some more findings:

 

  1. linux-6.6.8 still oopses at 1200 mhz...
  2. Downgrading to  linux 5.15.93 does seem to fix the issue though!
    1. "sudo docker run hello-world" now works correctly without oopsing the kernel.
    2. Still stable with max clock set to 1400 mhz
    3. So far so good, I'll report back if anything goes sideways once I get all my containers running again

 

Edited by mikeakers
Link to comment
Share on other sites

looks that my original message did not get trough as it was before creating account,

so to make it clear what i meant is:

remove from /etc/modules-load.d/modules.conf line that states

"ledtrig_netdev" as that is  kernel module that on our systems triggers kernel oops

best would be to just follow next steps to not need rebooting,

or if you plan to reboot just remove that module from loading and that is it no need for this script to run at all.

 

rmmod ledtrig_netdev

find /lib/modules/$(uname -r) -name ledtrig-netdev.ko -exec mv {} {}.backup ';'


 

 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines