SymbiosisSystems Posted November 16, 2020 Posted November 16, 2020 (edited) Still experiencing random kernel panic with latest build. Sometimes it boots, sometimes it doesn't , when operating from a remote location this is very frustrating. Spoiler DDR Version 1.24 20191016 In channel 0 CS = 0 MR0=0x18 MR4=0x1 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 1 CS = 0 MR0=0x18 MR4=0x1 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 0 training pass! channel 1 training pass! change freq to 416MHz 0,1 Channel 0: LPDDR4,416MHz Bus Width=32 Col=10 Bank=8 Row=16 CS=1 Die Bus-Width=16 Size=2048MB Channel 1: LPDDR4,416MHz Bus Width=32 Col=10 Bank=8 Row=16 CS=1 Die Bus-Width=16 Size=2048MB 256B stride channel 0 CS = 0 MR0=0x18 MR4=0x1 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 1 CS = 0 MR0=0x18 MR4=0x1 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 0 training pass! channel 1 training pass! channel 0, cs 0, advanced training done channel 1, cs 0, advanced training done change freq to 856MHz 1,0 ch 0 ddrconfig = 0x101, ddrsize = 0x40 ch 1 ddrconfig = 0x101, ddrsize = 0x40 pmugrf_os_reg[2] = 0x32C1F2C1, stride = 0xD ddr_set_rate to 328MHZ ddr_set_rate to 666MHZ ddr_set_rate to 928MHZ channel 0, cs 0, advanced training done channel 1, cs 0, advanced training done ddr_set_rate to 416MHZ, ctl_index 0 ddr_set_rate to 856MHZ, ctl_index 1 support 416 856 328 666 928 MHz, current 856MHz OUT Boot1: 2019-03-14, version: 1.19 CPUId = 0x0 ChipType = 0x10, 254 SdmmcInit=2 0 BootCapSize=100000 UserCapSize=14910MB FwPartOffset=2000 , 100000 mmc0:cmd5,20 SdmmcInit=0 0 BootCapSize=0 UserCapSize=30436MB FwPartOffset=2000 , 0 StorageInit ok = 79856 SecureMode = 0 SecureInit read PBA: 0x4 SecureInit read PBA: 0x404 SecureInit read PBA: 0x804 SecureInit read PBA: 0xc04 SecureInit read PBA: 0x1004 SecureInit read PBA: 0x1404 SecureInit read PBA: 0x1804 SecureInit read PBA: 0x1c04 SecureInit ret = 0, SecureMode = 0 atags_set_bootdev: ret:(0) GPT 0x3380ec0 signature is wrong recovery gpt... GPT 0x3380ec0 signature is wrong recovery gpt fail! LoadTrust Addr:0x4000 No find bl30.bin No find bl32.bin Load uboot, ReadLba = 2000 Load OK, addr=0x200000, size=0xdd6b0 RunBL31 0x40000 NOTICE: BL31: v1.3(debug):42583b6 NOTICE: BL31: Built : 07:55:13, Oct 15 2019 NOTICE: BL31: Rockchip release version: v1.1 INFO: GICv3 with legacy support detected. ARM GICV3 driver initialized in EL3 INFO: Using opteed sec cpu_context! INFO: boot cpu mask: 0 INFO: plat_rockchip_pmu_init(1190): pd status 3e INFO: BL31: Initializing runtime services WARNING: No OPTEE provided by BL2 boot loader, Booting device without OPTEE init ialization. SMC`s destined for OPTEE will return SMC_UNK ERROR: Error initializing runtime service opteed_fast INFO: BL31: Preparing for EL3 exit to normal world INFO: Entry point address = 0x200000 INFO: SPSR = 0x3c9 U-Boot 2020.07-armbian (Oct 31 2020 - 08:21:38 +0100) SoC: Rockchip rk3399 Reset cause: POR DRAM: 3.9 GiB PMIC: RK808 SF: Detected w25q128 with page size 256 Bytes, erase size 4 KiB, total 16 MiB MMC: mmc@fe320000: 1, sdhci@fe330000: 0 Loading Environment from MMC... *** Warning - bad CRC, using default environment In: serial Out: serial Err: serial Model: Helios64 Revision: 1.2 - 4GB non ECC Net: eth0: ethernet@fe300000 scanning bus for devices... Hit any key to stop autoboot: 0 switch to partitions #0, OK mmc1 is current device Scanning mmc 1:1... Found U-Boot script /boot/boot.scr 3185 bytes read in 6 ms (517.6 KiB/s) ## Executing script at 00500000 Boot script loaded from mmc 1 166 bytes read in 5 ms (32.2 KiB/s) 16003186 bytes read in 683 ms (22.3 MiB/s) 27331072 bytes read in 1160 ms (22.5 MiB/s) 79946 bytes read in 12 ms (6.4 MiB/s) 2698 bytes read in 8 ms (329.1 KiB/s) Applying kernel provided DT fixup script (rockchip-fixup.scr) ## Executing script at 09000000 ## Loading init Ramdisk from Legacy Image at 06000000 ... Image Name: uInitrd Image Type: AArch64 Linux RAMDisk Image (gzip compressed) Data Size: 16003122 Bytes = 15.3 MiB Load Address: 00000000 Entry Point: 00000000 Verifying Checksum ... OK ## Flattened Device Tree blob at 01f00000 Booting using the fdt blob at 0x1f00000 Loading Ramdisk to f4fa3000, end f5ee6032 ... OK Loading Device Tree to 00000000f4f27000, end 00000000f4fa2fff ... OK Starting kernel ... [ 26.392050] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [ 26.392551] Modules linked in: r8152 snd_soc_hdmi_codec snd_soc_rockchip_i2s snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer snd panfrost leds_pwm gpio_char ger pwm_fan gpu_sched rockchipdrm soundcore rockchip_vdec(C) dw_mipi_dsi hantro_ vpu(C) dw_hdmi v4l2_h264 rockchip_rga videobuf2_dma_contig analogix_dp videobuf2 _vmalloc videobuf2_dma_sg v4l2_mem2mem videobuf2_memops videobuf2_v4l2 drm_kms_h elper videobuf2_common videodev sg cec rc_core mc fusb30x(C) drm drm_panel_orien tation_quirks gpio_beeper cpufreq_dt zfs(POE) zunicode(POE) zavl(POE) icp(POE) z lua(POE) nfsd auth_rpcgss zcommon(POE) nfs_acl znvpair(POE) lockd spl(OE) grace lm75 sunrpc ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_me mcpy async_pq async_xor async_tx raid1 raid0 multipath linear md_mod realtek dwm ac_rk stmmac_platform stmmac mdio_xpcs adc_keys [ 26.399102] CPU: 4 PID: 0 Comm: swapper/4 Tainted: P C OE 5.8.17- rockchip64 #20.08.21 [ 26.399884] Hardware name: Helios64 (DT) [ 26.400237] pstate: 00000085 (nzcv daIf -PAN -UAO BTYPE=--) [ 26.400740] pc : do_undefinstr+0x2ec/0x310 [ 26.401107] lr : do_undefinstr+0x1e0/0x310 [ 26.401472] sp : ffff800011adbcd0 [ 26.401769] x29: ffff800011adbcd0 x28: ffff0000f6ea5700 [ 26.402242] x27: ffff0000f6ea5700 x26: ffff800011adc000 [ 26.402714] x25: ffff800011501d20 x24: 0000000000000000 [ 26.403186] x23: 0000000060000085 x22: ffff800010df9fc0 [ 26.403658] x21: ffff800011adbe80 x20: ffff0000f6ea5700 [ 26.404130] x19: ffff800011adbd40 x18: 0000000000000000 [ 26.404601] x17: 00018f3ebe947358 x16: 000142208873be78 [ 26.405073] x15: 0000000000000006 x14: 000000000000021c [ 26.405544] x13: 000000000000029a x12: 00000000000002a4 [ 26.406015] x11: 0000000000000001 x10: 0000000000000a20 [ 26.406487] x9 : ffff800011ba3e70 x8 : ffff0000f6ea6180 [ 26.406958] x7 : 00000000ffffffff x6 : 000000000000001f [ 26.407430] x5 : 0000000000000000 x4 : ffff800011816118 [ 26.407901] x3 : 0000000000000005 x2 : 0000000000010002 [ 26.408373] x1 : ffff0000f6ea5700 x0 : 0000000060000085 [ 26.408845] Call trace: [ 26.409068] do_undefinstr+0x2ec/0x310 [ 26.409406] el1_sync_handler+0x88/0x110 [ 26.409757] el1_sync+0x7c/0x100 [ 26.410051] check_preemption_disabled+0x18/0x108 [ 26.410470] debug_smp_processor_id+0x20/0x30 [ 26.410862] sched_ttwu_pending+0x34/0x168 [ 26.411230] flush_smp_call_function_queue+0xec/0x258 [ 26.411682] generic_smp_call_function_single_interrupt+0x14/0x20 [ 26.412223] handle_IPI+0x258/0x3e8 [ 26.412539] gic_handle_irq+0x154/0x158 [ 26.412882] el1_irq+0xb8/0x180 [ 26.413166] arch_cpu_idle+0x28/0x218 [ 26.413496] default_idle_call+0x1c/0x44 [ 26.413847] do_idle+0x210/0x288 [ 26.414137] cpu_startup_entry+0x28/0x68 [ 26.414490] secondary_start_kernel+0x140/0x178 [ 26.414898] Code: f9401bf7 17ffff7d a9025bf5 f9001bf7 (d4210000) [ 26.415448] ---[ end trace 720e27ef39d9569d ]--- [ 26.415860] Kernel panic - not syncing: Fatal exception in interrupt [ 26.416425] SMP: stopping secondary CPUs [ 26.416779] Kernel Offset: disabled [ 26.417091] CPU features: 0x240022,2000600c [ 26.417461] Memory Limit: none [ 26.417746] Rebooting in 90 seconds.. Edited November 16, 2020 by TRS-80 put long output inside code block inside spoiler
SymbiosisSystems Posted November 17, 2020 Author Posted November 17, 2020 Additional crash Spoiler Starting kernel ... [ 24.230268] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 24.230776] Modules linked in: snd_soc_hdmi_codec r8152 snd_soc_rockchip_i2s snd_soc_core rockchip_vdec(C) hantro_vpu(C) snd_pcm_dmaengine pwm_fan leds_pwm gpio_charger rockchip_rga snd_pcm v4l2_h264 videobuf2_dma_contig rockchipdrm snd_timer v4l2_mem2mem videobuf2_dma_sg videobuf2_vmalloc snd panfrost dw_mipi_dsi videobuf2_memops videobuf2_v4l2 dw_hdmi soundcore gpu_sched videobuf2_common analogix_dp videodev drm_kms_helper fusb30x(C) cec mc rc_core sg drm drm_panel_orientation_quirks zfs(POE) gpio_beeper cpufreq_dt zunicode(POE) zavl(POE) icp(POE) zlua(POE) nfsd auth_rpcgss zcommon(POE) nfs_acl znvpair(POE) lockd grace spl(OE) lm75 sunrpc ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear md_mod realtek dwmac_rk stmmac_platform stmmac mdio_xpcs adc_keys [ 24.237322] CPU: 5 PID: 0 Comm: swapper/5 Tainted: P C OE 5.8.17-rockchip64 #20.08.21 [ 24.238105] Hardware name: Helios64 (DT) [ 24.238459] pstate: 80000085 (Nzcv daIf -PAN -UAO BTYPE=--) [ 24.238963] pc : sched_clock+0x3c/0x90 [ 24.239301] lr : sched_clock_cpu+0x14/0x28 [ 24.239664] sp : ffff800011ae3e60 [ 24.239961] x29: 0000000000000100 x28: 0000000000000001 [ 24.240434] x27: ffff0000f6ea6580 x26: ffff800011ae4000 [ 24.240907] x25: ffff800011501d20 x24: 0000000000000168 [ 24.241378] x23: ffff800011836f08 x22: 0000000000000004 [ 24.241850] x21: ffff815c18a74108 x20: ffff800011836f00 [ 24.242322] x19: ffff015c2a2ab010 x18: 0000000000000000 [ 24.242793] x17: 00022f77f6ad6250 x16: ffff0000e0453fa8 [ 24.243265] x15: 0000000000000006 x14: 00001401d89ead14 [ 24.243736] x13: 0000000000000339 x12: 000000000000033f [ 24.244207] x11: 0000000000000001 x10: 0000000000000a20 [ 24.244679] x9 : ffff800011babe70 x8 : ffff0000f6ea7000 [ 24.245150] x7 : 00000000ffffffff x6 : 00000000388e7b38 [ 24.245622] x5 : 00ffffffffffffff x4 : 003305860a2eef00 [ 24.246093] x3 : 0000000000000000 x2 : 0000000000000080 [ 24.246565] x1 : 0000000000000004 x0 : 0000000000000005 [ 24.247036] Call trace: [ 24.247259] sched_clock+0x3c/0x90 [ 24.247570] Code: d50339bf 120002d5 9bb87eb5 8b1502f3 (f9400e60) [ 24.248120] ---[ end trace af056133bccc9297 ]--- [ 24.248531] Kernel panic - not syncing: Fatal exception in interrupt [ 24.249096] SMP: stopping secondary CPUs [ 25.416116] SMP: failed to stop secondary CPUs 0-1,5 [ 25.416556] Kernel Offset: disabled [ 25.416869] CPU features: 0x240022,2000600c [ 25.417241] Memory Limit: none [ 25.417527] Rebooting in 90 seconds..
SIGSEGV Posted November 17, 2020 Posted November 17, 2020 @SymbiosisSystems Since your system is crashing often, my guess is that you're not using it on a PROD environment yet. Have you tried the test builds at the bottom of the downloads page with the newer kernel? You might have better luck with those. I'm using the test build from Nov.13 and it has been very stable. I'm not using OMV just the OS and a few packages that I've configured manually to provide SMB, DLNA & iSCSI services. 1
gprovost Posted November 18, 2020 Posted November 18, 2020 Yes, as wisely advice by SIGSEGV, could you give a try to latest test build (aka DEV images) which are based on Linux Kernel 5.9. These DEV images are actually what will be soon the new Armbian release 20.11
SymbiosisSystems Posted November 18, 2020 Author Posted November 18, 2020 SIGSEV , Gauthier , Thank you for the suggestion. I've had to power off the Helios64 for the moment but should be able to try the 5.9 kernel at the weekend. I'm using it with ZFS and OMV and I hope it will be more stable than all the builds of 5.8 that I've tried so far !
gprovost Posted November 19, 2020 Posted November 19, 2020 Thanks for keeping us updated with some data. We are still having difficulty to find a root cause for some of the instability reported. If you still face same instability with LK5.9.y please experiment with governor in performance mode.
SymbiosisSystems Posted November 24, 2020 Author Posted November 24, 2020 So good news / bad news. The good news is that the 5.9.x kernel does indeed seem to have resolved the bootime kernel panics I was experiencing and I now seem to be able to reboot the board consistently without being stuck in any of the previous kernel panic / retry boot loops. The bad news is that I can't get zfs working as 0.8.4 in the backports repo doesn't support kernels after 5.6 and 0.8.5 isn't compatible with the zfsutils-linux package!
ShadowDance Posted November 25, 2020 Posted November 25, 2020 @SymbiosisSystems I take it you have a set of 0.8.5 modules built? They work fine with the 0.8.4 zfsutils-linux package, but it requires the zfs-dkms package which will fail to build. We can work around this by installing a dummy package that provides zfs-dkms so that we then can go ahead and install zfsutils-linux / zfs-zed / etc. from backports. Here's how you can create a dummy package: apt-get install --yes equivs mkdir zfs-dkms-dummy; cd $_ cat <<EOF >zfs-dkms Section: misc Priority: optional Standards-Version: 3.9.2 Package: zfs-dkms-dummy Version: 0.8.4 Maintainer: Me <me@localhost> Provides: zfs-dkms Architecture: all Description: Dummy zfs-dkms package for when using built kmod EOF equivs-build zfs-dkms dpkg -i zfs-dkms-dummy_0.8.4_all.deb After this, you can go ahead and install (if not already installed) the 0.8.5 modules (kmod-zfs-5.*-rockchip64_0.8.5-1_arm64.deb) and zfsutils-linux. 1
SymbiosisSystems Posted November 25, 2020 Author Posted November 25, 2020 Thanks ShadowDance, yes I've pulled the zfs 0.8.5 branch from github and built the modules. Even with the dummy 0.8.4 package installed , installing zfsutils-linux uninstalls the zfs-dkms 0.8.5 package though
ShadowDance Posted November 25, 2020 Posted November 25, 2020 Ah sorry, I left out that instead of installing zfs-dkms, you should install the kmod-zfs-5.*-rockchip64_0.8.5-1_arm64.deb package. The only dkms package should be the dummy. (I've updated my earlier post to reflect this.)
SymbiosisSystems Posted November 25, 2020 Author Posted November 25, 2020 Many thanks for your help there ShadowDance that's fixed it and I've now got ZFS running under OMV with the 5.9.10 kernel.
gprovost Posted November 26, 2020 Posted November 26, 2020 @SymbiosisSystems How the stability so far with LK5.9 ?
SymbiosisSystems Posted November 26, 2020 Author Posted November 26, 2020 @gprovost Gauthier, it's much better . With the 5.8.x kernel more often than not it would reboot to a kernel panic then a boot loop and would need to be physically re-powered to recover. With the 5.9.x kernel however I haven't experienced any such problems so far and have much more confidence in rebooting it.
Recommended Posts