Jump to content

nanopi neo plus2 weird issues and kernel oops with current kernel


emanuele-f

Recommended Posts

Sorry for the minimal info provided in this report but, due to the possible data corruption which the bug may cause, I'm not willing to reproduce this again on my board which runs in production.

 

One week ago, I experienced data corruption happening on the nanopi. Actually it was some time (more than one month) that I saw the board acting weird:

 

- long ssh transfers would sometime brake (via https://github.com/dooblem/bsync)

- `apt update` reported invalid keys for the official armbian repository

- on rare occasions, the board would freeze (not respond to ssh, requiring manual power cycle)

 

After experiencing data corruption, I though the board was broken but decided to switch to the legacy image via `armbian-config` as a last resort. When running `armbian-config` and selecting "Other" to switch kernel, the kernel crashed. I tried again and I was successful. With the legacy image `5.4.88-sunxi64 #21.02.3` the board works great, and all the reported issues do not occur any more. So it seems like there is a bug in the "current" image (it should be version 5.15.25), possibly in a module linked to the encryption. This is all the info I can provide for this report. Hope it helps

Link to comment
Share on other sites

Armbian & Khadas are rewarding contributors

On 4/30/2022 at 4:39 AM, Werner said:

Sounds more like a degraded sd card

Using internal emmc. After another week of usage no problems to report with the legacy kernel.

 

On 4/30/2022 at 4:39 AM, Werner said:

 

With the information given reproduction is basically impossible since nobody has a second sight about what is going on. So I have my doubts that this gets addressed.

Yeah, if I'm the only one to report this than this must be something related to my board. I will report here if I have new info

Link to comment
Share on other sites

I happen to have similar problems (ssh sessions suddenly "gone", only recoverable by power-cycle with a 35% chance that it comes up.)
I connected a serial console and got some kernel errors, so I saw, eg. during boot, those errors show up at the after the same [OK] lines, I suspect this to be a kernel problem as well. I could supply more information if guided.

/To


 

[  OK  ] Started udev Kernel Device Manager.
[    7.942988] Unable to handle kernel paging request at virtual address 00000000048666c0
[    7.951039] Mem abort info:
[    7.953872]   ESR = 0x96000004
[    7.957055]   EC = 0x25: DABT (current EL), IL = 32 bits
[    7.962512]   SET = 0, FnV = 0
[    7.965602]   EA = 0, S1PTW = 0
[    7.968851]   FSC = 0x04: level 0 translation fault
[    7.973806] Data abort info:
[    7.976742]   ISV = 0, ISS = 0x00000004
[    7.980631]   CM = 0, WnR = 0
[    7.983647] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000453ea000
[  OK      7.990108] [00000000048666c0] pgd=0000000000000000, p4d=0000000000000000
0m] Stopped     7.998301] Internal error: Oops: 96000004 [#1] SMP
1;39mudev Kernel[    8.004556] Modules linked in: cpufreq_dt sch_fq_codel g_serial libcomposite
 Device Manager   8.012981] CPU: 3 PID: 349 Comm: systemd-udevd Not tainted 5.15.43-sunxi64 #22.05.1
[0m.
[    8.022114] Hardware name: FriendlyARM NanoPi NEO Plus2 (DT)
[    8.028288] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    8.035250] pc : __handle_mm_fault+0x9c/0xaa8
[    8.036122] systemd[1]: systemd-udevd.service: Found left-over process 349 (systemd-udevd) in control group while starting unit. Ignoring.
[    8.039624] lr : handle_mm_fault+0xc0/0x240
[    8.039632] sp : ffff8000098ebce0
[    8.039635] x29: ffff8000098ebce0 x28: ffff000004d38dc0 x27: ffff0000048666e8
[    8.039648] x26: 0000000000000002 x25: 0000000004866680 x24: 0000000000000255
[    8.039657] x23: 0000000000000155 x22: ffff8000098ebeb0 x21: ffff800009054068
[    8.039666] x20: ffff0000053e1e40 x19: 0000aaaad8ecf250 x18: 0000000000000000
[    8.039675] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[    8.039683] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[    8.052176] systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
[    8.056272] 
[    8.056273] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
[    8.056283] x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff800008d49000
[    8.056291] x5 : ffff800036bb3000 x4 : ffff8000098ebd70 x3 : ffff800036bb3000
[    8.056300] x2 : 0000000000000255
[    8.062460] systemd[1]: Starting udev Kernel Device Manager...
[    8.066730]  x1 : 0000aaaad8ecf000 x0 : 0000000aaaad8ecf
[    8.066769] Call trace:
[    8.066776]  __handle_mm_fault+0x9c/0xaa8
[    8.066790]  handle_mm_fault+0xc0/0x240
[    8.066797]  do_page_fault+0x17c/0x3e0
[    8.164961]  do_mem_abort+0x40/0xb0
[    8.168463]  el0_da+0x24/0x58
[    8.171432]  el0t_64_sync_handler+0x68/0xb8
[    8.175612]  el0t_64_sync+0x180/0x184
         Startin[    8.179276] Code: f9402299 9274ce61 d367be77 a90607e0 (f9402336) 
g udev [    8.186741] ---[ end trace 3d252a91c10ca563 ]---

...

[    8.273462] Unable to handle kernel paging request at virtual address 00000000054526a0
[    8.281541] Mem abort info:
[    8.284544]   ESR = 0x96000004
[    8.287696]   EC = 0x25: DABT (current EL), IL = 32 bits
[    8.293079]   SET = 0, FnV = 0
[    8.296189]   EA = 0, S1PTW = 0
[    8.299367]   FSC = 0x04: level 0 translation fault
[    8.304307] Data abort info:
[    8.307221]   ISV = 0, ISS = 0x00000004
[    8.311099]   CM = 0, WnR = 0
[    8.314092] user pgtable: 4k pages, 48-bit VAs, pgdp=000000004bb6a000
[    8.320602] [00000000054526a0] pgd=0000000000000000, p4d=0000000000000000
[    8.327451] Internal error: Oops: 96000004 [#2] SMP
[    8.332355] Modules linked in: zram cpufreq_dt sch_fq_codel g_serial libcomposite
[    8.339856] CPU: 3 PID: 427 Comm: systemd-udevd Tainted: G      D           5.15.43-sunxi64 #22.05.1
[    8.348990] Hardware name: FriendlyARM NanoPi NEO Plus2 (DT)
[    8.354648] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    8.361611] pc : __handle_mm_fault+0x4e0/0xaa8
[    8.366091] lr : __handle_mm_fault+0x4d8/0xaa8
[    8.370544] sp : ffff8000098cbce0
[    8.373859] x29: ffff8000098cbce0 x28: ffff000001bb44c0 x27: ffff000002155fe8
[    8.380997] x26: 0000000000000002 x25: ffff000002155f80 x24: 0000000000000255
[    8.388136] x23: ffff800009054000 x22: ffff000005436ff8 x21: ffff800009054068
[    8.392176] systemd[1]: Condition check resulted in Dispatch Password Requests to Console Directory Watch when bootsplash is active being skipped.
[    8.395271] x20: ffff000008cdad80 x19: 0000000000000070 x18: 0000000000000000
[    8.395284] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[    8.395293] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[    8.395302] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
[    8.395311] x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff800008d49000
[    8.395319] x5 : ffff800036bb3000 x4 : 0000000000000000 x3 : 00e000004f28fbc3
[    8.395327] x2 : 00000000054526a0 x1 : 0000000000000000 x0 : 0000000000000000
[    8.395339] Call trace:
[    8.460799]  __handle_mm_fault+0x4e0/0xaa8
[    8.463355] systemd[1]: Condition check resulted in FUSE Control File System being skipped.
[    8.464934]  handle_mm_fault+0xc0/0x240
[    8.464946]  do_page_fault+0x17c/0x3e0
[    8.474455] systemd[1]: Condition check resulted in Kernel Trace File System being skipped.
[    8.477123]  do_mem_abort+0x40/0xb0
[    8.477137]  el0_da+0x24/0x58
[    8.477150]  el0t_64_sync_handler+0x68/0xb8
[    8.477158]  el0t_64_sync+0x180/0x184
[    8.477174] Code: f90057e0 97ffef3b f94053e2 f94047e3 (f9400040) 
[    8.477183] ---[ end trace 3d252a91c10ca564 ]---

...

[   68.518169] rcu: INFO: rcu_sched self-detected stall on CPU
[   68.523753] rcu:     3-....: (15000 ticks this GP) idle=0a9/1/0x4000000000000002 softirq=2037/2037 fqs=7492 
[   68.533310]  (t=15003 jiffies g=1237 q=4109)
[   68.537575] Task dump for CPU 3:
[   68.540797] task:systemd-udevd   state:R  running task     stack:    0 pid:  427 ppid:   398 flags:0x0000080a
[   68.550705] Call trace:
[   68.553137]  dump_backtrace+0x0/0x200
[   68.556808]  show_stack+0x18/0x60
[   68.560121]  sched_show_task+0x15c/0x198
[   68.564044]  dump_cpu_task+0x44/0x54
[   68.567618]  rcu_dump_cpu_stacks+0xf0/0x138
[   68.571798]  rcu_sched_clock_irq+0x790/0x9e8
[   68.576068]  update_process_times+0x9c/0xe8
[   68.580249]  tick_sched_handle.isra.23+0x40/0x50
[   68.584862]  tick_sched_timer+0x4c/0xa8
[   68.588694]  __hrtimer_run_queues+0xdc/0x208
[   68.592960]  hrtimer_interrupt+0x114/0x300
[   68.597053]  arch_timer_handler_phys+0x30/0x40
[   68.601494]  handle_percpu_devid_irq+0x84/0x138
[   68.606021]  handle_domain_irq+0x60/0x90
[   68.609940]  gic_handle_irq+0x6c/0x98
[   68.613600]  call_on_irq_stack+0x28/0x54
[   68.617517]  do_interrupt_handler+0x58/0x68
[   68.621696]  el1_interrupt+0x30/0x48
[   68.625270]  el1h_64_irq_handler+0x18/0x28
[   68.629362]  el1h_64_irq+0x74/0x78
[   68.632758]  queued_spin_lock_slowpath+0x21c/0x3d0
[   68.637544]  unmap_page_range+0x464/0x8a8
[   68.641551]  unmap_single_vma+0x44/0xa8
[   68.645384]  unmap_vmas+0x5c/0x80
[   68.648695]  exit_mmap+0x6c/0x198
[   68.652006]  mmput+0x74/0x190
[   68.654974]  do_exit+0x30c/0xa08
[   68.658201]  die+0x268/0x2a8
[   68.661079]  die_kernel_fault+0x64/0x78
[   68.664911]  __do_kernel_fault+0x90/0x180
[   68.668915]  do_page_fault+0xa4/0x3e0
[   68.672572]  do_translation_fault+0x58/0x68
[   68.676750]  do_mem_abort+0x40/0xb0
[   68.680234]  el1_abort+0x44/0x68
[   68.683460]  el1h_64_sync_handler+0x64/0xe8
[   68.687639]  el1h_64_sync+0x74/0x78
[   68.691122]  __handle_mm_fault+0x4e0/0xaa8
[   68.695214]  handle_mm_fault+0xc0/0x240
[   68.699046]  do_page_fault+0x17c/0x3e0
[   68.702790]  do_mem_abort+0x40/0xb0
[   68.706274]  el0_da+0x24/0x58
[   68.709238]  el0t_64_sync_handler+0x68/0xb8


...

        Starting Avahi mDNS/DNS-SD Stack...
[  164.910173] Unable to handle kernel paging request at virtual address 000000000808a2b0
[  164.918109] Mem abort info:
[  164.920896]   ESR = 0x86000004
[  164.923945]   EC = 0x21: IABT (current EL), IL = 32 bits
[  164.929247]   SET = 0, FnV = 0
[  164.932294]   EA = 0, S1PTW = 0
[  164.935429]   FSC = 0x04: level 0 translation fault
[  164.940299] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000042840000
[  164.946730] [000000000808a2b0] pgd=0000000000000000, p4d=0000000000000000
[  164.953513] Internal error: Oops: 86000004 [#3] SMP
[  164.958385] Modules linked in: zram cpufreq_dt sch_fq_codel g_serial libcomposite
[  164.965870] CPU: 3 PID: 427 Comm: systemd-udevd Tainted: G      D           5.15.43-sunxi64 #22.05.1
[  164.974990] Hardware name: FriendlyARM NanoPi NEO Plus2 (DT)
[  164.980640] pstate: 000000c5 (nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  164.987592] pc : 0x808a2b0
[  164.990300] lr : 0x808a2b0
[  164.993002] sp : ffff800009543dc0
[  164.996309] x29: ffff800009543dc0 x28: ffff00003fdaef80 x27: 0000000000000000
[  165.003438] x26: 00000000000000c0 x25: 00000026654864ec x24: 0000000000000000
[  165.010566] x23: ffff000001bb44c0 x22: 0000000000000003 x21: ffff8000091ffdc0
[  165.017695] x20: ffff8000093dbd70 x19: ffff00003fdb2dc0 x18: 0000000000036eee
[  165.024823] x17: ffff800036bb3000 x16: ffff800009544000 x15: 0000aa2a46ecde46
[  165.031951] x14: 0000000000000319 x13: 000000000000025b x12: 0000000000000001
[  165.039079] x11: 0000000000000004 x10: ffff8000093f0b70 x9 : ffff800009054068
[  165.046207] x8 : 0000000000000000 x7 : ffff00003fdb2ec0 x6 : ffff8000091ffdc0
[  165.053336] x5 : 000000000112a880 x4 : 0000000000000000 x3 : 0000000000000002
[  165.060463] x2 : ffff000001825000 x1 : ffff00003fdb2dc0 x0 : ffff8000091ffdc0
[  165.067593] Call trace:
[  165.070036]  0x808a2b0
[  165.072392]  update_process_times+0xd0/0xe8
[  165.076579]  tick_sched_handle.isra.23+0x40/0x50
[  165.081191]  tick_sched_timer+0x4c/0xa8
[  165.085023]  __hrtimer_run_queues+0xdc/0x208
[  165.089289]  hrtimer_interrupt+0x114/0x300
[  165.093383]  arch_timer_handler_phys+0x30/0x40
[  165.097823]  handle_percpu_devid_irq+0x84/0x138
[  165.102350]  handle_domain_irq+0x60/0x90
[  165.106266]  gic_handle_irq+0x6c/0x98
[  165.109927]  call_on_irq_stack+0x28/0x54
[  165.113847]  do_interrupt_handler+0x58/0x68
[  165.118026]  el1_interrupt+0x30/0x48
[  165.121600]  el1h_64_irq_handler+0x18/0x28
[  165.125692]  el1h_64_irq+0x74/0x78
[  165.129088]  queued_spin_lock_slowpath+0x21c/0x3d0
[  165.133874]  unmap_page_range+0x464/0x8a8
[  165.137881]  unmap_single_vma+0x44/0xa8
[  165.141713]  unmap_vmas+0x5c/0x80
[  165.145024]  exit_mmap+0x6c/0x198
[  165.148335]  mmput+0x74/0x190
[  165.151303]  do_exit+0x30c/0xa08
[  165.154530]  die+0x268/0x2a8
[  165.157410]  die_kernel_fault+0x64/0x78
[  165.161242]  __do_kernel_fault+0x90/0x180
[  165.165247]  do_page_fault+0xa4/0x3e0
[  165.168904]  do_translation_fault+0x58/0x68
[  165.173082]  do_mem_abort+0x40/0xb0
[  165.176566]  el1_abort+0x44/0x68
[  165.179791]  el1h_64_sync_handler+0x64/0xe8
[  165.183970]  el1h_64_sync+0x74/0x78
[  165.187454]  __handle_mm_fault+0x4e0/0xaa8
[  165.191547]  handle_mm_fault+0xc0/0x240
[  165.195379]  do_page_fault+0x17c/0x3e0
[  165.199123]  do_mem_abort+0x40/0xb0
[  165.202607]  el0_da+0x24/0x58
[  165.205573]  el0t_64_sync_handler+0x68/0xb8
[  165.209751]  el0t_64_sync+0x180/0x184
[  165.213416] Code: bad PC value
[  165.216467] ---[ end trace 3d252a91c10ca565 ]---
[  165.221077] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[  165.227939] SMP: stopping secondary CPUs
[  166.298521] SMP: failed to stop secondary CPUs 0-3
[  166.303307] Kernel Offset: disabled
[  166.306787] CPU features: 0x00002001,00000842
[  166.311136] Memory Limit: none
[  166.314188] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---


 

Edited by targa
Link to comment
Share on other sites

Unfortunately I seem to no be able to change to the above mentioned kernel ?!

image.png.12b8ad39a0c6efba5678c9e3a198ad26.png

 

cat /tmp/switch_kernel.log 
linux-image-legacy-sunxi64=21.02.3 linux-u-boot-nanopineoplus2-legacy linux-dtb-
legacy-sunxi64=21.02.3

 

Link to comment
Share on other sites

Im having a similar issue:
getting this error - I used the most recent image (Armbian_22.11.1_Nanopineoplus2_bullseye_current_5.15.80), flashed it using balena etcher, and gets the following on booting:

 


[  OK  ] Finished OpenVPN service.
[  OK  ] Started /etc/rc.local Compatibility.
         Starting Hostname Service...
[  OK  ] Finished Permit User Sessions.
         Starting Hold until boot process finishes up...
         Starting Terminate Plymouth Boot Screen...
[   15.679114] Internal error: Oops: 96000004 [#1] SMP
[   15.684010] Modules linked in: lz4hc lz4 brcmfmac brcmutil cfg80211 zram sunxi_cedrus(C) rfkill sun8i_mbus videobuf2_dma_contig v4l2_mem2mem videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc cpufreq_dt g_serial libcomposite fuse sunrpc realtek dwmac_sun8i mdio_mux
[   15.708370] CPU: 2 PID: 575 Comm: systemd-journal Tainted: G         C        5.15.80-sunxi64 #22.11.1
[   15.717664] Hardware name: FriendlyARM NanoPi NEO Plus2 (DT)
[   15.723314] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   15.730266] pc : update_min_vruntime+0x20/0x60
[   15.734716] lr : update_curr+0x68/0x158
[   15.738547] sp : ffff80000a40b870
[   15.741855] x29: ffff80000a40b870 x28: 0000000000000008 x27: 0000000000000000
[   15.748983] x26: 0000000000000000 x25: 0000000000000002 x24: 0000000000000000
[   15.756111] x23: 0000000000000001 x22: ffff00003fd8ce80 x21: 00000000000881bc
[   15.763239] x20: ffff00003fd8cf00 x19: ffff000004e3c540 x18: 0000000000000000
[   15.770367] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[   15.777495] x14: 0000000000000003 x13: 0000000000000000 x12: 0000000000000002
[   15.784623] x11: 0000000000000003 x10: 0000000000000002 x9 : 000000000001d400
[   15.791750] x8 : 000000000001d400 x7 : 0000000000000917 x6 : 000000000fe98ef1
[   15.798878] x5 : 00ffffffffffffff x4 : 0000000000000001 x3 : f7ff000001890090
[   15.806006] x2 : 000000086be3ec16 x1 : 000000086c5738d9 x0 : ffff00003fd8cf00
[   15.813135] Call trace:
[   15.815577]  update_min_vruntime+0x20/0x60
[   15.819669]  dequeue_entity+0x24/0x268
[   15.823414]  dequeue_task_fair+0x8c/0x290
[   15.827419]  deactivate_task+0x64/0x90
[   15.831162]  load_balance+0x3bc/0x9d0
[   15.834821]  newidle_balance.isra.149+0x2a0/0x410
[   15.839519]  pick_next_task_fair+0x4c/0x300
[   15.843697]  __schedule+0xf4/0x640
[   15.847097]  schedule+0x58/0xc8
[   15.850234]  schedule_hrtimeout_range_clock+0x104/0x118
[   15.855454]  schedule_hrtimeout_range+0x14/0x20
[   15.859979]  do_epoll_wait+0x604/0x770
[   15.863725]  do_compat_epoll_pwait.part.38+0x14/0x98
[   15.868683]  __arm64_sys_epoll_pwait+0x78/0xc8
[   15.873121]  invoke_syscall+0x44/0x108
[   15.876867]  el0_svc_common.constprop.3+0x84/0xf8
[   15.881565]  do_el0_svc+0x24/0x88
[   15.884875]  el0_svc+0x20/0x50
[   15.887926]  el0t_64_sync_handler+0x90/0xb8
[   15.892103]  el0t_64_sync+0x180/0x184
[   15.895766] Code: b9403844 34000184 f9402842 b4000083 (f9402063)
[   15.901851] ---[ end trace 5adba06e7db4ed24 ]---

 

 

I'm using a powered usb hub to power the board (voltage reads stably 5.2V), so I don't think the issue is lack of electrical power.

I have a second neo plus two, and it's the same problem.

 

Link to comment
Share on other sites

Recently the `linux-dtb-legacy-sunxi64` package has been updated to 23.8.1, which uses kernel `5.15.127-legacy-sunxi64` and freezes started to happen again.

To address the issue, I've used armbian-config to downgrade to `linux-image-legacy-sunxi64=21.02.3`, which has kernel 5.4.88, and board is running fine again now.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines