1 1
Meier

Infrequent RockPro64 freeze (kernel NULL pointer)

Recommended Posts

On several RockPro64 boards I experience infrequent freezes, mostly directly on boot, but also after some longer (hours, days) uptime. I use the official power adapter and no additional hardware except a PCIe adapter for an SSD, which works flawlessly when operational.

 

When looking at the kern.log, there are quite a few errors and warnings, but comparing to a successful boot these are all also present. So I don't think they cause the freeze directly.

 

Currently, I recorded the freeze on the latest Armbian Bionic as release just recently. FWIW, I think the same issue also occurs on the previous Debian Stretch image (had various freezes, but have not recorded any details yet).

 

On a unsuccessful boot, the error occurs after about 8 seconds. A reboot (or two) usually fixes the issue, until the next time...

Jul  8 15:18:23 carol kernel: [    8.528275] Unable to handle kernel NULL pointer dereference at virtual address 00000000
Jul  8 15:18:23 carol kernel: [    8.530625] pgd = ffffffc0ead3f000
Jul  8 15:18:23 carol kernel: [    8.532515] [00000000] *pgd=0000000000000000, *pud=0000000000000000
Jul  8 15:18:23 carol kernel: [    8.534710] Internal error: Oops: 96000005 [#1] SMP
Jul  8 15:18:24 carol kernel: [    8.536753] Modules linked in: af_packet iptable_nat nf_nat_ipv4 nf_nat nf_log_ipv4 nf_log_common xt_LOG xt_limit nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_conntrack nf_conntrack iptable_filter snd_soc_rockchip_hdmi_dp rk_vcodec ip_tables x_tables autofs4 phy_rockchip_pcie
Jul  8 15:18:24 carol kernel: [    8.542598] CPU: 5 PID: 1044 Comm: find Not tainted 4.4.182-rockchip64 #1
Jul  8 15:18:24 carol kernel: [    8.544959] Hardware name: Pine64 RockPro64 (DT)
Jul  8 15:18:24 carol kernel: [    8.547155] task: ffffffc0eb247000 task.stack: ffffffc0e1c78000
Jul  8 15:18:24 carol kernel: [    8.549491] PC is at do_dentry_open+0x234/0x2e4
Jul  8 15:18:24 carol kernel: [    8.551687] LR is at do_dentry_open+0x288/0x2e4
Jul  8 15:18:24 carol kernel: [    8.553852] pc : [<ffffff80081f2738>] lr : [<ffffff80081f278c>] pstate: a0000145
Jul  8 15:18:24 carol kernel: [    8.556284] sp : ffffffc0e1c7bbc0
Jul  8 15:18:24 carol kernel: [    8.558403] x29: ffffffc0e1c7bbc0 x28: ffffffc0eb247000 
Jul  8 15:18:24 carol kernel: [    8.560742] x27: 0000000000000000 x26: ffffffc0f26eb000 
Jul  8 15:18:24 carol kernel: [    8.563064] x25: 000000000000011d x24: ffffffc0e1cff690 
Jul  8 15:18:24 carol kernel: [    8.565376] x23: ffffff8008219cb8 x22: 0000000000000000 
Jul  8 15:18:24 carol kernel: [    8.567670] x21: 0000000000000000 x20: ffffffc0f27882b0 
Jul  8 15:18:24 carol kernel: [    8.569954] x19: ffffffc0e1cff680 x18: 0000007fb4979a70 
Jul  8 15:18:24 carol kernel: [    8.572191] x17: 0000007fb48e8848 x16: ffffff80081f3ea4 
Jul  8 15:18:24 carol kernel: [    8.574416] x15: 0000000000000000 x14: ffffffffffffffff 
Jul  8 15:18:24 carol kernel: [    8.576663] x13: 0000000000000000 x12: 0101010101010101 
Jul  8 15:18:24 carol kernel: [    8.578896] x11: 7f7f7f7f7f7f7f7f x10: 0000007fb4a8a140 
Jul  8 15:18:24 carol kernel: [    8.581115] x9 : 0000000000000000 x8 : ffffffc0e1cff7b8 
Jul  8 15:18:24 carol kernel: [    8.583390] x7 : 0000000000000000 x6 : ffffffc0f061d1e9 
Jul  8 15:18:24 carol kernel: [    8.585641] x5 : 0000000000000000 x4 : 00000000000055b1 
Jul  8 15:18:24 carol kernel: [    8.587874] x3 : 00000040eee4a000 x2 : ffffff8008219a20 
Jul  8 15:18:24 carol kernel: [    8.590114] x1 : ffffff8008c02140 x0 : 0000000000000000 
Jul  8 15:18:24 carol kernel: [    8.592341] 
Jul  8 15:18:24 carol kernel: [    8.592341] PC: 0xffffff80081f26b8:
Jul  8 15:18:24 carol kernel: [    8.596219] 26b8  54fffd60 f940c680 f9001660 b4fffd20 aa1603e1 aa1303e0 940c0a2c 2a0003f6
Jul  8 15:18:24 carol kernel: [    8.598764] 26d8  35000700 b9405261 d5033bbf f940ca80 b5000320 b50004b7 f9401660 f9402c17
Jul  8 15:18:24 carol kernel: [    8.601305] 26f8  b5000457 b9405660 370004c0 b9405660 36080100 f9401661 f9400c22 b5000062
Jul  8 15:18:24 carol kernel: [    8.603883] 2718  f9401421 b4000061 320e0000 b9005660 b9405260 12166c00 b9005260 f9409a60
...

Full kern.log boot log:

https://pastebin.com/zcpxB1HQ

 

Please find attached the full armbianmonitor output here:

https://pastebin.com/NkVAejC6

 

Any help is greatly appreciated!

Board: Not on the list

Share this post


Link to post
Share on other sites

Additional info: after some quick uptime of ~1h the board started to fault repeatedly, but without crashing completely. SSH connections were closed, but later a login was possible again.

 

Three specific errors in short interval, all logged in full here: https://pastebin.com/SAcUAGb2

Jul  8 16:59:31 carol kernel: [ 3752.234046] Unhandled fault: synchronous external abort (0x96000210) at 0xffffff8009d5401c
Jul  8 16:59:31 carol kernel: [ 3752.240736] Internal error: : 96000210 [#1] SMP
...

Jul  8 17:00:12 carol kernel: [ 3759.996389] BUG: spinlock lockup suspected on CPU#3, nvme/296
Jul  8 17:00:12 carol kernel: [ 3760.001966]  lock: 0xffffff8009141870, .magic: dead4ead, .owner: nvme/296, .owner_cpu: 3
...

Jul  8 17:00:12 carol kernel: [ 3792.419942] Watchdog detected hard LOCKUP on cpu 3
Jul  8 17:00:12 carol kernel: [ 3792.420464] ------------[ cut here ]------------
Jul  8 17:00:12 carol kernel: [ 3792.430494] WARNING: at kernel/watchdog.c:352

 

Share this post


Link to post
Share on other sites

Thanks Igor! Will try that today and let you know how it works out.

 

Update: works fine so far, after 3+ hours uptime, also with the self-compiled image, but intervals between freezes can be quite long.

Share this post


Link to post
Share on other sites

Good. I hope this will be it! I took one RK3399 board (NanoPC T4) with me and it is serving as real world test -> KODI media center / web browser / VPN gateway / AP / file server. I am looking/hoping to get three weeks of up-time ;)

 

I also move this topic under RK3399 sub-forum since it suits here better.

Share this post


Link to post
Share on other sites

Unfortunately, I still keep getting the freezes from time to time. Two thinks I noticed:

  • When running `stress -i 4 -d 4` I can crash the board in ~3 minutes very reliably. But not any board, just this one, but even without any additional peripherals like the SSD plugged in. As it's running on eMMC, it might be this particular eMMC that causes the crash.
  • This let me to build a latest Armbian Ubuntu 18.04 image with `overlayroot` to eliminate all I/O to the eMMC. This board that has been crashing has now been running for ~2 days.

I'll try to gather more data in case the boards crash with the build from the current master branch.

 

Just FYI in case you're curious: this is the project I'm working on https://github.com/digitalbitbox/bitbox-base.

Share this post


Link to post
Share on other sites

The first error *might* be related to the firmware file not present, as printed just below the oops message.

For this one, you could try this :
 

Quote

cd /tmp
wget https://raw.githubusercontent.com/wkennington/linux-firmware/master/rockchip/dptx.bin
cp /tmp/dptx.bin /lib/firmware/rockchip/dptx.bin


The methodology is taken from here : https://forum.pine64.org/showthread.php?tid=6510

Now the spinlock seems to be NVMe related... When you boot correctly, does something like find / generates a freeze ?

EDIT : Didn't read the whole thread correctly...

 

With overlayroot enabled, are you also testing with stress -i 4 -d 4 ?

 

Share this post


Link to post
Share on other sites

Thanks Myy for the pointers. I'll try if the dptx.bin driver helps preventing the boot oops message. Is there a way to tell how that binary file has been compiled, or to make sure it is legit?

 

Regarding the stress testing in overlayroot, this command immediately aborts as it fills up the available tmpfs within seconds. Good thought about find /, I'll try that.

Share this post


Link to post
Share on other sites
3 hours ago, Meier said:

Is there a way to tell how that binary file has been compiled, or to make sure it is legit?

 

That firmware seems to be part of Closed McBlobby family : https://patchwork.kernel.org/patch/9225567/

 

However, a more legit source for this firmware would be : https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/rockchip

 

Give the find / command a try and, if possible, try it on a NVMe drive.

Share this post


Link to post
Share on other sites
1 1