Jump to content

NVME I/O QID Timeout & Disabled Controller with Bullseye on NanoPi M4v2


Recommended Posts

Posted
Armbianmonitor:

NanoPi M4v2 with official NVME hat kit, boot from EMMC with
OS: Armbian w/ Debian Bullseye, Linux 5.10.12-rockchip64
SSD: Kioxia XG6 (Thinkpad part)

That also happened with 5.9 kernel. Not tested with other OS.

 

When doing cold boot, always got kernel messages like:

Feb 11 01:14:34 localhost kernel: [    3.102606] nvme 0000:01:00.0: enabling device (0000 -> 0002)
Feb 11 01:14:34 localhost kernel: [    3.106554] input: gpio-keys as /devices/platform/gpio-keys/input/input0
Feb 11 01:14:34 localhost kernel: [    3.107121] of_cfs_init
Feb 11 01:14:34 localhost kernel: [    3.107161] of_cfs_init: OK
Feb 11 01:14:34 localhost kernel: [    3.129048] dwmmc_rockchip fe310000.mmc: Successfully tuned phase to 193
Feb 11 01:14:34 localhost kernel: [    3.135392] mmc0: new ultra high speed SDR104 SDIO card at address 0001
Feb 11 01:14:34 localhost kernel: [    3.144242] usb 7-1: new high-speed USB device number 2 using xhci-hcd
Feb 11 01:14:34 localhost kernel: [    3.295224] usb 7-1: New USB device found, idVendor=2109, idProduct=2817, bcdDevice= 0.50
Feb 11 01:14:34 localhost kernel: [    3.295236] usb 7-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Feb 11 01:14:34 localhost kernel: [    3.295244] usb 7-1: Product: USB2.0 Hub
Feb 11 01:14:34 localhost kernel: [    3.295253] usb 7-1: Manufacturer: VIA Labs, Inc.
Feb 11 01:14:34 localhost kernel: [    3.355660] hub 7-1:1.0: USB hub found
Feb 11 01:14:34 localhost kernel: [    3.355851] hub 7-1:1.0: 4 ports detected
Feb 11 01:14:34 localhost kernel: [    3.421534] random: crng init done
Feb 11 01:14:34 localhost kernel: [    3.450423] usb 8-1: new SuperSpeed Gen 1 USB device number 2 using xhci-hcd
Feb 11 01:14:34 localhost kernel: [    3.542849] usb 8-1: New USB device found, idVendor=2109, idProduct=0817, bcdDevice= 0.50
Feb 11 01:14:34 localhost kernel: [    3.542860] usb 8-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Feb 11 01:14:34 localhost kernel: [    3.542869] usb 8-1: Product: USB3.0 Hub
Feb 11 01:14:34 localhost kernel: [    3.542877] usb 8-1: Manufacturer: VIA Labs, Inc.
Feb 11 01:14:34 localhost kernel: [    3.563553] hub 8-1:1.0: USB hub found
Feb 11 01:14:34 localhost kernel: [    3.563793] hub 8-1:1.0: 4 ports detected
Feb 11 01:14:34 localhost kernel: [   64.480292] nvme nvme0: I/O 12 QID 0 timeout, disable controller
Feb 11 01:14:34 localhost kernel: [   64.588242] nvme nvme0: Device shutdown incomplete; abort shutdown
Feb 11 01:14:34 localhost kernel: [   64.588624] nvme nvme0: Identify Controller failed (-4)
Feb 11 01:14:34 localhost kernel: [   64.588636] nvme nvme0: Removing after probe failure status: -5
Feb 11 01:14:34 localhost kernel: [   64.607118] Freeing unused kernel memory: 4352K
Feb 11 01:14:34 localhost kernel: [   64.620362] Run /init as init process


As stated, nvme disk will be removed after 60 seconds timeout.

 

Currently my method to recover is to do a reboot (the attached link contains the dmesg from a reboot while it works), then there has no problem to use the SSD. I've also tried to add bootdelay to armbianEnv.txt in /boot, but seems no help.

 

Is this a bug or compatibility issue? Is it possible that I can add back the nvme0 drive without reboot? (I've tried the pci rescan sysfs file, seems no use too.)

 

BTW, the unsafe_shutdowns in nvme's smart-log also increases.

  • LiX changed the title to NVME Can't be Enabled when Cold Booting on NanoPi M4v2
Posted
23 hours ago, pkfox said:

What SSD are you using? I've been using Samsung 970 EVO for a couple of years without any issues

 

It's a KIOXIA XG6...I guess that maybe a compatibility or driver issue now, since it's been dropped by system after idle for hours yesterday...

Quote

[  244.836170] sd 0:0:0:0: [sda] Attached SCSI disk
[10169.406639] nvme nvme0: I/O 29 QID 0 timeout, reset controller
[10329.152079] nvme nvme0: I/O 13 QID 0 timeout, disable controller
[10334.244085] nvme nvme0: Device shutdown incomplete; abort shutdown
[10334.260337] nvme nvme0: could not set timestamp (-4)
[10334.260346] nvme nvme0: Removing after probe failure status: -4
[10334.320623] nvme nvme0: failed to set APST feature (-19)

 

Posted
On 2/12/2021 at 11:23 PM, pkfox said:

What SSD are you using? I've been using Samsung 970 EVO for a couple of years without any issues

 

Which kernel are you using? Because my board is rebooting unexpectedly almos every day... 

Posted
On 2/14/2021 at 12:18 AM, i5Js said:

Which kernel are you using? Because my board is rebooting unexpectedly almos every day... 

Hi, it's 5.10.12-rockchip64 #21.02.1, and seems my situation has been improved since I increased bootdelay in armbianEnv.txt from 3 to 10, I got 0/2 fail rate in the past 2 days.

Posted
On 2/14/2021 at 10:18 AM, i5Js said:

Which kernel are you using? Because my board is rebooting unexpectedly almos every day... 

I have a few Nanopi m4 v1 and v2 boards most of them use 5.10.16-rockchip64 kernel but one runs 4.4.213-rk3399 because I haven't got around to updating it but all of the boards have the nvme hat and a Samsung 970 SSD fitted.

 

Posted
On 2/16/2021 at 12:23 AM, LiX said:

Hi, it's 5.10.12-rockchip64 #21.02.1, and seems my situation has been improved since I increased bootdelay in armbianEnv.txt from 3 to 10, I got 0/2 fail rate in the past 2 days.

 

Can you share your armbianEnv.txt? I've just checked mine and it has not that setting.

Posted
On 2/21/2021 at 3:26 AM, i5Js said:

 

Can you share your armbianEnv.txt? I've just checked mine and it has not that setting.

Hi, I added that up myself without checking the documents...moreover, I am headless so I can't confirm it actually works or not...also I've removed it, since I still have about 1/3 fail rate with this.

Posted
On 2/14/2021 at 11:18 AM, i5Js said:

Which kernel are you using? Because my board is rebooting unexpectedly almos every day... 

 

Try the latest. I also used to have lot's of crashes. But not anymore. You can download this kernel through armbian-config.

 

Armbian 21.02.2 Buster with Linux 5.10.16-rockchip64

  • LiX changed the title to NVME I/O QID Timeout & Disabled Controller with Bullseye on NanoPi M4v2
Posted
On 2/25/2021 at 3:39 AM, Lennyz1988 said:

 

Try the latest. I also used to have lot's of crashes. But not anymore. You can download this kernel through armbian-config.

 

Armbian 21.02.2 Buster with Linux 5.10.16-rockchip64

Actually I think it went worse with the 5.10.16, I have almost 100% fail rate yesterday, it frozen my $home since it's on the NVME with following message:

Feb 27 13:14:48 localhost lightdm[2532]: Error getting user list from org.freedesktop.Accounts: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.freedesktop.Accounts was not provide
d by any .service files
Feb 27 13:14:51 localhost kernel: [   21.178369] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
Feb 27 13:18:55 localhost kernel: [  265.927366] Modules linked in: governor_performance algif_hash algif_skcipher af_alg bnep zram zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zcommon(POE) znvpair(POE)
zavl(POE) icp(POE) spl(OE) snd_soc_hdmi_codec snd_soc_simple_card rockchip_rga hci_uart hantro_vpu(C) snd_soc_simple_card_utils rc_cec rockchip_vdec(C) btqca dw_hdmi_i2s_audio snd_soc_rt5651 dw_hdmi_c
ec videobuf2_dma_sg panfrost btrtl btsdio v4l2_h264 snd_soc_rockchip_i2s snd_soc_rl6231 videobuf2_dma_contig snd_soc_rockchip_spdif gpu_sched v4l2_mem2mem videobuf2_vmalloc btbcm fusb302 btintel snd_s
oc_core bluetooth videobuf2_memops videobuf2_v4l2 snd_pcm_dmaengine videobuf2_common tcpm snd_pcm videodev snd_timer brcmfmac snd mc soundcore brcmutil typec cfg80211 rfkill cpufreq_dt nfsd auth_rpcgs
s nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 rockchipdrm realtek dw_hdmi dw_mipi_dsi analogix_dp drm_kms_helper dwmac_rk stmmac_platform cec stmmac rc_core pcs_xpcs drm drm_panel_orientatio
n_quirks
Feb 27 13:18:55 localhost kernel: [  265.935437] CPU: 3 PID: 312 Comm: kworker/3:1H Tainted: P         C OE     5.10.16-rockchip64 #21.02.2
Feb 27 13:18:55 localhost kernel: [  265.936262] Hardware name: FriendlyElec NanoPi M4 Ver2.0 (DT)
Feb 27 13:18:55 localhost kernel: [  265.936798] Workqueue: kblockd blk_mq_timeout_work
Feb 27 13:18:55 localhost kernel: [  265.937246] pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--)
Feb 27 13:18:55 localhost kernel: [  265.937794] pc : nvme_timeout+0x48/0x370
Feb 27 13:18:55 localhost kernel: [  265.938159] lr : blk_mq_check_expired+0x210/0x230
Feb 27 13:18:55 localhost kernel: [  265.938584] sp : ffff800012603bd0
Feb 27 13:18:55 localhost kernel: [  265.938890] x29: ffff800012603bd0 x28: ffff000040816a20
Feb 27 13:18:55 localhost kernel: [  265.939388] x27: ffff000043f8fec8 x26: 00000000000000c0
Feb 27 13:18:55 localhost kernel: [  265.939884] x25: ffff000044b7e810 x24: ffff000044b7e700
Feb 27 13:18:55 localhost kernel: [  265.940379] x23: ffff0000f1bc5a40 x22: ffff0000449bac00
Feb 27 13:18:55 localhost kernel: [  265.940875] x21: ffff8000118b9948 x20: ffff00004454f000
Feb 27 13:18:55 localhost kernel: [  265.941370] x19: ffff800012a4201c x18: 0000000000000000
Feb 27 13:18:55 localhost kernel: [  265.941865] x17: 0000000000000000 x16: 0000000000000000
Feb 27 13:18:55 localhost kernel: [  265.942360] x15: 0000000000000001 x14: 00000000000000e5
Feb 27 13:18:55 localhost kernel: [  265.942855] x13: 000000000006e4a8 x12: 0000000000000000
Feb 27 13:18:55 localhost kernel: [  265.943349] x11: 0000000000000000 x10: 0000000000000a40
Feb 27 13:18:55 localhost kernel: [  265.943844] x9 : ffff800012603d20 x8 : fefefefefefefeff
Feb 27 13:18:55 localhost kernel: [  265.944339] x7 : ffff800012603da0 x6 : ffff000044ae07e0
Feb 27 13:18:55 localhost kernel: [  265.944834] x5 : 0000000000000002 x4 : 0000000000000001
Feb 27 13:18:55 localhost kernel: [  265.945328] x3 : 0000000000000000 x2 : ffff800010987d90
Feb 27 13:18:55 localhost kernel: [  265.945822] x1 : 0000000000000000 x0 : 0000000000000000
Feb 27 13:18:55 localhost kernel: [  265.946315] Call trace:
Feb 27 13:18:55 localhost kernel: [  265.946554]  nvme_timeout+0x48/0x370
Feb 27 13:18:55 localhost kernel: [  265.946889]  blk_mq_check_expired+0x210/0x230
Feb 27 13:18:55 localhost kernel: [  265.947293]  bt_iter+0x60/0x70
Feb 27 13:18:55 localhost kernel: [  265.947585]  blk_mq_queue_tag_busy_iter+0x1e4/0x338
Feb 27 13:18:55 localhost kernel: [  265.948030]  blk_mq_timeout_work+0x17c/0x1a8
Feb 27 13:18:55 localhost kernel: [  265.948428]  process_one_work+0x1ec/0x4d0
Feb 27 13:18:55 localhost kernel: [  265.948801]  worker_thread+0x48/0x478
Feb 27 13:18:55 localhost kernel: [  265.949143]  kthread+0x140/0x150
Feb 27 13:18:55 localhost kernel: [  265.949449]  ret_from_fork+0x10/0x34
Feb 27 13:18:55 localhost kernel: [  265.950347] ---[ end trace b4d79516f26f1c5d ]---
Feb 27 13:17:08 localhost kernel: [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]

and I can't shutdown the (headless) system properly, but only unplug the power.

 

I am going to try linux-image-legacy-rockchip64 (4.4.213-rockchip64) next. 

Posted

Sorry guys, I just realized this issue is most probably related to OPAL SED ( which I enabled shortly after I got the SSD). I will try to file a bug at kernel.org. Thanks for your time.

This thread is quite old. Please consider starting a new thread rather than reviving this one.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines