LiX Posted February 12, 2021 Posted February 12, 2021 Armbianmonitor: http://ix.io/2P8l NanoPi M4v2 with official NVME hat kit, boot from EMMC with OS: Armbian w/ Debian Bullseye, Linux 5.10.12-rockchip64 SSD: Kioxia XG6 (Thinkpad part) That also happened with 5.9 kernel. Not tested with other OS. When doing cold boot, always got kernel messages like: Feb 11 01:14:34 localhost kernel: [ 3.102606] nvme 0000:01:00.0: enabling device (0000 -> 0002) Feb 11 01:14:34 localhost kernel: [ 3.106554] input: gpio-keys as /devices/platform/gpio-keys/input/input0 Feb 11 01:14:34 localhost kernel: [ 3.107121] of_cfs_init Feb 11 01:14:34 localhost kernel: [ 3.107161] of_cfs_init: OK Feb 11 01:14:34 localhost kernel: [ 3.129048] dwmmc_rockchip fe310000.mmc: Successfully tuned phase to 193 Feb 11 01:14:34 localhost kernel: [ 3.135392] mmc0: new ultra high speed SDR104 SDIO card at address 0001 Feb 11 01:14:34 localhost kernel: [ 3.144242] usb 7-1: new high-speed USB device number 2 using xhci-hcd Feb 11 01:14:34 localhost kernel: [ 3.295224] usb 7-1: New USB device found, idVendor=2109, idProduct=2817, bcdDevice= 0.50 Feb 11 01:14:34 localhost kernel: [ 3.295236] usb 7-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0 Feb 11 01:14:34 localhost kernel: [ 3.295244] usb 7-1: Product: USB2.0 Hub Feb 11 01:14:34 localhost kernel: [ 3.295253] usb 7-1: Manufacturer: VIA Labs, Inc. Feb 11 01:14:34 localhost kernel: [ 3.355660] hub 7-1:1.0: USB hub found Feb 11 01:14:34 localhost kernel: [ 3.355851] hub 7-1:1.0: 4 ports detected Feb 11 01:14:34 localhost kernel: [ 3.421534] random: crng init done Feb 11 01:14:34 localhost kernel: [ 3.450423] usb 8-1: new SuperSpeed Gen 1 USB device number 2 using xhci-hcd Feb 11 01:14:34 localhost kernel: [ 3.542849] usb 8-1: New USB device found, idVendor=2109, idProduct=0817, bcdDevice= 0.50 Feb 11 01:14:34 localhost kernel: [ 3.542860] usb 8-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0 Feb 11 01:14:34 localhost kernel: [ 3.542869] usb 8-1: Product: USB3.0 Hub Feb 11 01:14:34 localhost kernel: [ 3.542877] usb 8-1: Manufacturer: VIA Labs, Inc. Feb 11 01:14:34 localhost kernel: [ 3.563553] hub 8-1:1.0: USB hub found Feb 11 01:14:34 localhost kernel: [ 3.563793] hub 8-1:1.0: 4 ports detected Feb 11 01:14:34 localhost kernel: [ 64.480292] nvme nvme0: I/O 12 QID 0 timeout, disable controller Feb 11 01:14:34 localhost kernel: [ 64.588242] nvme nvme0: Device shutdown incomplete; abort shutdown Feb 11 01:14:34 localhost kernel: [ 64.588624] nvme nvme0: Identify Controller failed (-4) Feb 11 01:14:34 localhost kernel: [ 64.588636] nvme nvme0: Removing after probe failure status: -5 Feb 11 01:14:34 localhost kernel: [ 64.607118] Freeing unused kernel memory: 4352K Feb 11 01:14:34 localhost kernel: [ 64.620362] Run /init as init process As stated, nvme disk will be removed after 60 seconds timeout. Currently my method to recover is to do a reboot (the attached link contains the dmesg from a reboot while it works), then there has no problem to use the SSD. I've also tried to add bootdelay to armbianEnv.txt in /boot, but seems no help. Is this a bug or compatibility issue? Is it possible that I can add back the nvme0 drive without reboot? (I've tried the pci rescan sysfs file, seems no use too.) BTW, the unsafe_shutdowns in nvme's smart-log also increases.
pkfox Posted February 12, 2021 Posted February 12, 2021 What SSD are you using? I've been using Samsung 970 EVO for a couple of years without any issues
LiX Posted February 13, 2021 Author Posted February 13, 2021 23 hours ago, pkfox said: What SSD are you using? I've been using Samsung 970 EVO for a couple of years without any issues It's a KIOXIA XG6...I guess that maybe a compatibility or driver issue now, since it's been dropped by system after idle for hours yesterday... Quote [ 244.836170] sd 0:0:0:0: [sda] Attached SCSI disk [10169.406639] nvme nvme0: I/O 29 QID 0 timeout, reset controller [10329.152079] nvme nvme0: I/O 13 QID 0 timeout, disable controller [10334.244085] nvme nvme0: Device shutdown incomplete; abort shutdown [10334.260337] nvme nvme0: could not set timestamp (-4) [10334.260346] nvme nvme0: Removing after probe failure status: -4 [10334.320623] nvme nvme0: failed to set APST feature (-19)
i5Js Posted February 14, 2021 Posted February 14, 2021 On 2/12/2021 at 11:23 PM, pkfox said: What SSD are you using? I've been using Samsung 970 EVO for a couple of years without any issues Which kernel are you using? Because my board is rebooting unexpectedly almos every day...
LiX Posted February 15, 2021 Author Posted February 15, 2021 On 2/14/2021 at 12:18 AM, i5Js said: Which kernel are you using? Because my board is rebooting unexpectedly almos every day... Hi, it's 5.10.12-rockchip64 #21.02.1, and seems my situation has been improved since I increased bootdelay in armbianEnv.txt from 3 to 10, I got 0/2 fail rate in the past 2 days.
pkfox Posted February 20, 2021 Posted February 20, 2021 On 2/14/2021 at 10:18 AM, i5Js said: Which kernel are you using? Because my board is rebooting unexpectedly almos every day... I have a few Nanopi m4 v1 and v2 boards most of them use 5.10.16-rockchip64 kernel but one runs 4.4.213-rk3399 because I haven't got around to updating it but all of the boards have the nvme hat and a Samsung 970 SSD fitted.
i5Js Posted February 21, 2021 Posted February 21, 2021 On 2/16/2021 at 12:23 AM, LiX said: Hi, it's 5.10.12-rockchip64 #21.02.1, and seems my situation has been improved since I increased bootdelay in armbianEnv.txt from 3 to 10, I got 0/2 fail rate in the past 2 days. Can you share your armbianEnv.txt? I've just checked mine and it has not that setting.
LiX Posted February 25, 2021 Author Posted February 25, 2021 On 2/21/2021 at 3:26 AM, i5Js said: Can you share your armbianEnv.txt? I've just checked mine and it has not that setting. Hi, I added that up myself without checking the documents...moreover, I am headless so I can't confirm it actually works or not...also I've removed it, since I still have about 1/3 fail rate with this.
Lennyz1988 Posted February 25, 2021 Posted February 25, 2021 On 2/14/2021 at 11:18 AM, i5Js said: Which kernel are you using? Because my board is rebooting unexpectedly almos every day... Try the latest. I also used to have lot's of crashes. But not anymore. You can download this kernel through armbian-config. Armbian 21.02.2 Buster with Linux 5.10.16-rockchip64
LiX Posted March 1, 2021 Author Posted March 1, 2021 On 2/25/2021 at 3:39 AM, Lennyz1988 said: Try the latest. I also used to have lot's of crashes. But not anymore. You can download this kernel through armbian-config. Armbian 21.02.2 Buster with Linux 5.10.16-rockchip64 Actually I think it went worse with the 5.10.16, I have almost 100% fail rate yesterday, it frozen my $home since it's on the NVME with following message: Feb 27 13:14:48 localhost lightdm[2532]: Error getting user list from org.freedesktop.Accounts: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.freedesktop.Accounts was not provide d by any .service files Feb 27 13:14:51 localhost kernel: [ 21.178369] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready Feb 27 13:18:55 localhost kernel: [ 265.927366] Modules linked in: governor_performance algif_hash algif_skcipher af_alg bnep zram zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zcommon(POE) znvpair(POE) zavl(POE) icp(POE) spl(OE) snd_soc_hdmi_codec snd_soc_simple_card rockchip_rga hci_uart hantro_vpu(C) snd_soc_simple_card_utils rc_cec rockchip_vdec(C) btqca dw_hdmi_i2s_audio snd_soc_rt5651 dw_hdmi_c ec videobuf2_dma_sg panfrost btrtl btsdio v4l2_h264 snd_soc_rockchip_i2s snd_soc_rl6231 videobuf2_dma_contig snd_soc_rockchip_spdif gpu_sched v4l2_mem2mem videobuf2_vmalloc btbcm fusb302 btintel snd_s oc_core bluetooth videobuf2_memops videobuf2_v4l2 snd_pcm_dmaengine videobuf2_common tcpm snd_pcm videodev snd_timer brcmfmac snd mc soundcore brcmutil typec cfg80211 rfkill cpufreq_dt nfsd auth_rpcgs s nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 rockchipdrm realtek dw_hdmi dw_mipi_dsi analogix_dp drm_kms_helper dwmac_rk stmmac_platform cec stmmac rc_core pcs_xpcs drm drm_panel_orientatio n_quirks Feb 27 13:18:55 localhost kernel: [ 265.935437] CPU: 3 PID: 312 Comm: kworker/3:1H Tainted: P C OE 5.10.16-rockchip64 #21.02.2 Feb 27 13:18:55 localhost kernel: [ 265.936262] Hardware name: FriendlyElec NanoPi M4 Ver2.0 (DT) Feb 27 13:18:55 localhost kernel: [ 265.936798] Workqueue: kblockd blk_mq_timeout_work Feb 27 13:18:55 localhost kernel: [ 265.937246] pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--) Feb 27 13:18:55 localhost kernel: [ 265.937794] pc : nvme_timeout+0x48/0x370 Feb 27 13:18:55 localhost kernel: [ 265.938159] lr : blk_mq_check_expired+0x210/0x230 Feb 27 13:18:55 localhost kernel: [ 265.938584] sp : ffff800012603bd0 Feb 27 13:18:55 localhost kernel: [ 265.938890] x29: ffff800012603bd0 x28: ffff000040816a20 Feb 27 13:18:55 localhost kernel: [ 265.939388] x27: ffff000043f8fec8 x26: 00000000000000c0 Feb 27 13:18:55 localhost kernel: [ 265.939884] x25: ffff000044b7e810 x24: ffff000044b7e700 Feb 27 13:18:55 localhost kernel: [ 265.940379] x23: ffff0000f1bc5a40 x22: ffff0000449bac00 Feb 27 13:18:55 localhost kernel: [ 265.940875] x21: ffff8000118b9948 x20: ffff00004454f000 Feb 27 13:18:55 localhost kernel: [ 265.941370] x19: ffff800012a4201c x18: 0000000000000000 Feb 27 13:18:55 localhost kernel: [ 265.941865] x17: 0000000000000000 x16: 0000000000000000 Feb 27 13:18:55 localhost kernel: [ 265.942360] x15: 0000000000000001 x14: 00000000000000e5 Feb 27 13:18:55 localhost kernel: [ 265.942855] x13: 000000000006e4a8 x12: 0000000000000000 Feb 27 13:18:55 localhost kernel: [ 265.943349] x11: 0000000000000000 x10: 0000000000000a40 Feb 27 13:18:55 localhost kernel: [ 265.943844] x9 : ffff800012603d20 x8 : fefefefefefefeff Feb 27 13:18:55 localhost kernel: [ 265.944339] x7 : ffff800012603da0 x6 : ffff000044ae07e0 Feb 27 13:18:55 localhost kernel: [ 265.944834] x5 : 0000000000000002 x4 : 0000000000000001 Feb 27 13:18:55 localhost kernel: [ 265.945328] x3 : 0000000000000000 x2 : ffff800010987d90 Feb 27 13:18:55 localhost kernel: [ 265.945822] x1 : 0000000000000000 x0 : 0000000000000000 Feb 27 13:18:55 localhost kernel: [ 265.946315] Call trace: Feb 27 13:18:55 localhost kernel: [ 265.946554] nvme_timeout+0x48/0x370 Feb 27 13:18:55 localhost kernel: [ 265.946889] blk_mq_check_expired+0x210/0x230 Feb 27 13:18:55 localhost kernel: [ 265.947293] bt_iter+0x60/0x70 Feb 27 13:18:55 localhost kernel: [ 265.947585] blk_mq_queue_tag_busy_iter+0x1e4/0x338 Feb 27 13:18:55 localhost kernel: [ 265.948030] blk_mq_timeout_work+0x17c/0x1a8 Feb 27 13:18:55 localhost kernel: [ 265.948428] process_one_work+0x1ec/0x4d0 Feb 27 13:18:55 localhost kernel: [ 265.948801] worker_thread+0x48/0x478 Feb 27 13:18:55 localhost kernel: [ 265.949143] kthread+0x140/0x150 Feb 27 13:18:55 localhost kernel: [ 265.949449] ret_from_fork+0x10/0x34 Feb 27 13:18:55 localhost kernel: [ 265.950347] ---[ end trace b4d79516f26f1c5d ]--- Feb 27 13:17:08 localhost kernel: [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034] and I can't shutdown the (headless) system properly, but only unplug the power. I am going to try linux-image-legacy-rockchip64 (4.4.213-rockchip64) next.
LiX Posted March 2, 2021 Author Posted March 2, 2021 Sorry guys, I just realized this issue is most probably related to OPAL SED ( which I enabled shortly after I got the SSD). I will try to file a bug at kernel.org. Thanks for your time.
Recommended Posts