Krita Posted July 26, 2021 Posted July 26, 2021 (edited) Hi all, been trying to sort out a stability problem with what was a rock solid helios64, i moved from running armbian on the eMMC to a 256GB sandisk extreme SD card due to plex continually filling it up. currently running Armbian 21.05.6 Buster with Linux 5.10.43-rockchip64, running OMV, Plex and ZFS not much more it appears this is where things go pear shaped: Spoiler [ 6636.457621] ata1.00: exception Emask 0x2 SAct 0x3800000 SErr 0x1000400 action 0x6 [ 6636.458306] ata1.00: irq_stat 0x08000000 [ 6636.458659] ata1: SError: { Proto TrStaTrns } [ 6636.459051] ata1.00: failed command: READ FPDMA QUEUED [ 6636.459516] ata1.00: cmd 60/58:b8:e0:b3:73/05:00:01:00:00/40 tag 23 ncq dma 7 00416 in [ 6636.459516] res 40/00:bc:e0:b3:73/00:00:01:00:00/40 Emask 0x2 (HSM v iolation) [ 6636.460898] ata1.00: status: { DRDY } [ 6636.461227] ata1.00: failed command: READ FPDMA QUEUED [ 6636.461743] ata1.00: cmd 60/40:c0:80:ae:73/05:00:01:00:00/40 tag 24 ncq dma 6 88128 in [ 6636.461743] res 40/00:bc:e0:b3:73/00:00:01:00:00/40 Emask 0x2 (HSM v iolation) [ 6636.457621] ata1.00: exception Emask 0[ 6636.463130] ata1.00: status: { DRDY } x2 SAct 0x3800000 SErr 0x1000400 action 0x6 [ 6636.458[ 6636.463942] ata1.00: failed command: READ FPDMA QUEUED 306] ata1.00: irq_stat 0x08000000 [ [ 6636.464874] ata1.00: cmd 60/20:c8:c0:b3:73/00:00:01:00:00/40 tag 25 ncq dma 16384 in [ 6636.464874] res 40/00:bc:e0:b3:73/00:00:01:00:00/40 Emask 0x2 (HSM v iolation) 6636.458659] ata1: SError: { Proto TrStaTrns }[ 6636.466733] ata1.00: status: { DRDY } [ 6636.459051] ata1.00: failed command: READ FPDMA QUEUED [ 6636.459516] ata1.00: cmd 60/58:b8:e0:b3:73/05:00:01:00:00/40 tag 23 ncq dma 7 00416 in res 40/00:bc:e0:b3:73/00:00:01:00:00/40 Emask 0x2 (HSM v iolation) [ 6636.460898] ata1.00: status: { DRDY } [ 6636.461227] ata1.00: failed command: READ FPDMA QUEUED [ 6636.461743] ata1.00: cmd 60/40:c0:80:ae:73/05:00:01:00:00/40 tag 24 ncq dma 6 88128 in res 40/00:bc:e0:b3:73/00:00:01:00:00/40 Emask 0x2 (HSM v iolation) [ 6636.463130] ata1.00: status: { DRDY } [ 6636.463942] ata1.00: failed command: READ FPDMA QUEUED [ 6636.464874] ata1.00: cmd 60/20:c8:c0:b3:73/00:00:01:00:00/40 tag 25 ncq dma 1 6384 in res 40/00:bc:e0:b3:73/00:00:01:00:00/40 Emask 0x2 (HSM v iolation) [ 6636.466733] ata1.00: status: { DRDY } [ 6636.467511] ata1: hard resetting link [ 6636.943205] blk_update_request: I/O error, dev sda, sector 24359904 op 0x0:(R EAD) flags 0x700 phys_seg 12 prio class 0 [ 6636.944176] zio pool=helios vdev=/dev/disk/by-id/ata-WDC_WD40PURX-64GVNY0_WD- WCC4E2LRTU90-part1 error=5 type=1 offset=12471222272 size=700416 flags=40080ca8 [ 6636.945526] blk_update_request: I/O error, dev sda, sector 24358528 op 0x0:(R EAD) flags 0x4700 phys_seg 168 prio class 0 [ 6636.941590] ata1: SATA link up 6.0 Gbps (SS[ 6636.946685] blk_update_request: I/O error, dev sda, sector 24359872 op 0x0:(READ) flags 0x700 phys_seg 4 prio c lass 0 tatus 133 SControl 300) [ 6636.942889] ata1.00 6636.947962] zio pool=helios vdev=/dev/disk/by-id/ata-WDC _WD40PURX-64GVNY0_WD-WCC4E2LRTU90-part1 error=5 type=1 offset=12470517760 size=7 04512 flags=40080ca8 m: configured for UDMA/133 [ 6636.943169] sd 0:0:0:0: [sda] tag#23 UNKNOWN(0x2003) Result: hostbyte=0x00 dr iverbyte=0x08 cmd_age=5s [ 6636.943179] sd 0:0:0:0: [sda] tag#23 Sense Key : 0x5 [current] [ 6636.943186] sd 0:0:0:0: [sda] tag#23 ASC=0x21 ASCQ=0x4 [ 6636.943196] sd 0:0:0:0: [sda] tag#23 CDB: opcode=0x88 88 00 00 00 00 00 01 73 b3 e0 00 00 05 58 00 00 [ 6636.943205] blk_update_request: I/O error, dev sda, sector 24359904 op 0x0:(R EAD) flags 0x700 phys_seg 12 prio class 0 [ 6636.944176] zio pool=helios vdev=/dev/disk/by-id/ata-WDC_WD40PURX-64GVNY0_WD- WCC4E2LRTU90-part1 error=5 type=1 offset=12471222272 size=700416 flags=40080ca8 [ 6636.945496] sd 0:0:0:0: [sda] tag#24 UNKNOWN(0x2003) Result: hostbyte=0x00 dr iverbyte=0x08 cmd_age=5s [ 6636.945504] sd 0:0:0:0: [sda] tag#24 Sense Key : 0x5 [current] [ 6636.945511] sd 0:0:0:0: [sda] tag#24 ASC=0x21 ASCQ=0x4 [ 6636.945518] sd 0:0:0:0: [sda] tag#24 CDB: opcode=0x88 88 00 00 00 00 00 01 73 ae 80 00 00 05 40 00 00 [ 6636.945526] blk_update_request: I/O error, dev sda, sector 24358528 op 0x0:(R EAD) flags 0x4700 phys_seg 168 prio class 0 [ 6636.946645] sd 0:0:0:0: [sda] tag#25 UNKNOWN(0x2003) Result: hostbyte=0x00 dr iverbyte=0x08 cmd_age=5s [ 6636.946659] sd 0:0:0:0: [sda] tag#25 Sense Key : 0x5 [current] [ 6636.946666] sd 0:0:0:0: [sda] tag#25 ASC=0x21 ASCQ=0x4 [ 6636.946676] sd 0:0:0:0: [sda] tag#25 CDB: opcode=0x88 88 00 00 00 00 00 01 73 b3 c0 00 00 00 20 00 00 [ 6636.946685] blk_update_request: I/O error, dev sda, sector 24359872 op 0x0:(R EAD) flags 0x700 phys_seg 4 prio class 0 [ 6636.947962] zio pool=helios vdev=/dev/disk/by-id/ata-WDC_WD40PURX-64GVNY0_WD- WCC4E2LRTU90-part1 error=5 type=1 offset=12470517760 size=704512 flags=40080ca8 [ 6636.949804] ata1: EH complete DDR Version 1.24 20191016 In soft reset SRX channel 0 CS = 0 MR0=0x18 MR4=0x2 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 1 CS = 0 MR0=0x18 MR4=0x1 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 0 training pass! channel 1 training pass! change freq to 416MHz 0,1 Channel 0: LPDDR4,416MHz Bus Width=32 Col=10 Bank=8 Row=16 CS=1 Die Bus-Width=16 Size=2048MB Channel 1: LPDDR4,416MHz Bus Width=32 Col=10 Bank=8 Row=16 CS=1 Die Bus-Width=16 Size=2048MB 256B stride channel 0 CS = 0 MR0=0x18 MR4=0x2 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 1 CS = 0 MR0=0x18 MR4=0x1 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 0 training pass! channel 1 training pass! channel 0, cs 0, advanced training done channel 1, cs 0, advanced training done change freq to 856MHz 1,0 ch 0 ddrconfig = 0x101, ddrsize = 0x40 ch 1 ddrconfig = 0x101, ddrsize = 0x40 pmugrf_os_reg[2] = 0x32C1F2C1, stride = 0xD ddr_set_rate to 328MHZ ddr_set_rate to 666MHZ ddr_set_rate to 928MHZ channel 0, cs 0, advanced training done channel 1, cs 0, advanced training done ddr_set_rate to 416MHZ, ctl_index 0 ddr_set_rate to 856MHZ, ctl_index 1 support 416 856 328 666 928 MHz, current 856MHz OUT Boot1: 2019-03-14, version: 1.19 CPUId = 0x0 ChipType = 0x10, 323 mmc: ERROR: SDHCI ERR:cmd:0x102,stat:0x18000 mmc: ERROR: Card did not respond to voltage select! emmc reinit mmc: ERROR: SDHCI ERR:cmd:0x102,stat:0x18000 mmc: ERROR: Card did not respond to voltage select! emmc reinit mmc: ERROR: SDHCI ERR:cmd:0x102,stat:0x18000 mmc: ERROR: Card did not respond to voltage select! SdmmcInit=2 1 mmc0:cmd5,20 SdmmcInit=0 0 BootCapSize=0 UserCapSize=244016MB FwPartOffset=2000 , 0 StorageInit ok = 56917 SecureMode = 0 SecureInit read PBA: 0x4 SecureInit read PBA: 0x404 SecureInit read PBA: 0x804 SecureInit read PBA: 0xc04 SecureInit read PBA: 0x1004 SecureInit read PBA: 0x1404 SecureInit read PBA: 0x1804 SecureInit read PBA: 0x1c04 SecureInit ret = 0, SecureMode = 0 atags_set_bootdev: ret:(0) GPT 0x3380ec0 signature is wrong recovery gpt... GPT 0x3380ec0 signature is wrong recovery gpt fail! LoadTrust Addr:0x4000 No find bl30.bin No find bl32.bin Load uboot, ReadLba = 2000 hdr 0000000003380880 + 0x0:0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x0 0,0x00,0x00,0x00,0x00,0x00, Load OK, addr=0x200000, size=0xe5b60 RunBL31 0x40000 NOTICE: BL31: v1.3(debug):42583b6 NOTICE: BL31: Built : 07:55:13, Oct 15 2019 NOTICE: BL31: Rockchip release version: v1.1 INFO: GICv3 with legacy support detected. ARM GICV3 driver initialized in EL3 INFO: Using opteed sec cpu_context! INFO: boot cpu mask: 0 INFO: plat_rockchip_pmu_init(1190): pd status 3e INFO: BL31: Initializing runtime services WARNING: No OPTEE provided by BL2 boot loader, Booting device without OPTEE init ialization. SMC`s destined for OPTEE will return SMC_UNK ERROR: Error initializing runtime service opteed_fast INFO: BL31: Preparing for EL3 exit to normal world INFO: Entry point address = 0x200000 INFO: SPSR = 0x3c9 U-Boot 2020.10-armbian (May 06 2021 - 17:13:15 +0000) SoC: Rockchip rk3399 Reset cause: WDOG DRAM: 3.9 GiB PMIC: RK808 SF: Detected w25q128 with page size 256 Bytes, erase size 4 KiB, total 16 MiB MMC: mmc@fe320000: 1, sdhci@fe330000: 0 Loading Environment from MMC... Card did not respond to voltage select! *** Warning - No block device, using default environment In: serial Out: serial Err: serial Model: Helios64 Revision: 1.2 - 4GB non ECC Net: eth0: ethernet@fe300000 scanning bus for devices... starting USB... Bus usb@fe380000: USB EHCI 1.00 Bus dwc3: usb maximum-speed not found Register 2000140 NbrPorts 2 Starting the controller USB XHCI 1.10 scanning bus usb@fe380000 for devices... 1 USB Device(s) found scanning bus dwc3 for devices... cannot reset port 4!? 5 USB Device(s) found scanning usb for storage devices... 1 Storage Device(s) found Hit any key to stop autoboot: 0 switch to partitions #0, OK mmc1 is current device ** No partition table - mmc 1 ** Card did not respond to voltage select! Device 0: Vendor: Seagate Rev: 070B Prod: Expansion Desk Type: Hard Disk Capacity: 2861588.4 MB = 2794.5 GB (732566645 x 4096) ... is now current device ** Unrecognized filesystem type ** scanning bus for devices... Device 0: unknown device Speed: 1000, full duplex BOOTP broadcast 1 DHCP client bound to address 192.168.0.254 (3 ms) *** Warning: no boot file name; using 'C0A800FE.img' Using ethernet@fe300000 device TFTP from server 192.168.0.1; our IP address is 192.168.0.254 Filename 'C0A800FE.img'. Load address: 0x800800 Loading: T T T T T T T T T T Retry count exceeded; starting again missing environment variable: pxeuuid missing environment variable: bootfile Retrieving file: pxelinux.cfg/01-64-62-66-d0-01-58 I've aready given i a bump in voltage as recommended in the config file but i cant tell where to check if it took the settting, frequency is set in armbian-config to ondemand. does anyone have any ideas? Thanks in advance Krita Edited July 28, 2021 by Krita formatting 0 Quote
Krita Posted July 27, 2021 Author Posted July 27, 2021 apparently my searching skills are rubbish, found a thread with the FPDMA QUEUED fault, will investigate that, but seems thats a seperate isue to the reboot. 0 Quote
Krita Posted July 28, 2021 Author Posted July 28, 2021 So Ive had the crash happen again after allmost exactly 24hrs, this time with logging verbose 5 so i actually got a bit of data. this was under no load at all, no one logged on just a single PC on the network running USB-C serial logging. I have no idea how to interoperate the data but I'm instigating what it means to the best of my ability, any pointer would be great Spoiler [88215.370992] Unable to handle kernel paging request at virtual address 003f8000118b99f0 [88215.371700] Mem abort info: [88215.371948] ESR = 0x96000004 [88215.372220] EC = 0x25: DABT (current EL), IL = 32 bits [88215.372686] SET = 0, FnV = 0 [88215.372956] EA = 0, S1PTW = 0 [88215.373234] Data abort info: [88215.373488] ISV = 0, ISS = 0x00000004 [88215.373825] CM = 0, WnR = 0 [88215.374087] [003f8000118b99f0] address between user and kernel address ranges [88215.374713] Internal error: Oops: 96000004 [#1] PREEMPT SMP [88215.375203] Modules linked in: softdog governor_performance snd_soc_hdmi_codec r8152 rockchipdrm snd_soc_rockchip_i2s dw_mipi_dsi rockchip_vdec(C) hantro_vpu(C) dw_hdmi rockchip_rga snd_soc_core analogix_dp v4l2_h264 snd_pcm_dmaengine videobuf2_dma_contig videobuf2_dma_sg videobuf2_vmalloc v4l2_mem2mem gpio_charger drm_kms_helper videobuf2_memops fusb302 snd_pcm cec videobuf2_v4l2 pwm_fan panfrost leds_pwm snd_timer tcpm rc_core videobuf2_common gpu_sched snd typec videodev sg drm soundcore mc drm_panel_orientation_quirks gpio_beeper cpufreq_dt zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) nfsd zcommon(POE) auth_rpcgss znvpair(POE) nfs_acl zavl(POE) icp(POE) lockd spl(OE) grace ledtrig_netdev lm75 sunrpc ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear md_mod uas realtek dwmac_rk stmmac_platform stmmac pcs_xpcs adc_keys [88215.382244] CPU: 5 PID: 0 Comm: swapper/5 Tainted: P C OE 5.10.43-rockchip64 #21.05.4 [88215.383019] Hardware name: Helios64 (DT) [88215.383366] pstate: 60000085 (nZCv daIf -PAN -UAO -TCO BTYPE=--) [88215.383904] pc : scheduler_tick+0xc4/0x140 [88215.384266] lr : scheduler_tick+0xc4/0x140 [88215.384626] sp : ffff800011c13d90 [88215.384918] x29: ffff800011c13d90 x28: 0000503bdc75d080 [88215.385387] x27: ffff0000f77bb6c0 x26: 0000000000000002 [88215.385856] x25: 0000000000000080 x24: ffff80001156a000 [88215.386325] x23: ffff000000711d00 x22: ffff80001157fd00 [88215.386793] x21: 0000000000000005 x20: ffff8000118b99c8 [88215.387261] x19: ffff0000f77c7d00 x18: 0000000000000000 [88215.387729] x17: 0000000000000000 x16: 00000000000073c0 [88215.388197] x15: 000028eb9d29edfa x14: 0000000000000000 [88215.388666] x13: 0000000000000270 x12: 0000000000000000 [88215.389134] x11: 0000000000000000 x10: 0000000000000004 [88215.389603] x9 : 0000000000000270 x8 : 0000000000000000 [88215.390071] x7 : ffff0000f77c7d00 x6 : ffff0000f77c8800 [88215.390540] x5 : 0000000000001270 x4 : ffff8000e6248000 [88215.391007] x3 : 0000000000010001 x2 : ffff80001156a000 [88215.391475] x1 : ffff8000112a1f58 x0 : 0000000000000005 [88215.391944] Call trace: [88215.392163] scheduler_tick+0xc4/0x140 [88215.392497] update_process_times+0x8c/0xa0 [88215.392867] tick_sched_handle.isra.19+0x40/0x58 [88215.393273] tick_sched_timer+0x58/0xb0 [88215.393613] __hrtimer_run_queues+0x104/0x388 [88215.393997] hrtimer_interrupt+0xf4/0x250 [88215.394353] arch_timer_handler_phys+0x30/0x40 [88215.394746] handle_percpu_devid_irq+0xa0/0x298 [88215.395144] generic_handle_irq+0x30/0x48 [88215.395498] __handle_domain_irq+0x94/0x108 [88215.395869] gic_handle_irq+0xc0/0x140 [88215.396202] el1_irq+0xc8/0x180 [88215.396482] arch_cpu_idle+0x18/0x28 [88215.396799] default_idle_call+0x44/0x1bc [88215.397152] do_idle+0x204/0x278 [88215.397437] cpu_startup_entry+0x28/0x60 [88215.397789] secondary_start_kernel+0x170/0x180 [88215.398191] Code: 94000cfb aa1303e0 94369ab7 94051e08 (f8757a82) [88215.398733] ---[ end trace 57a1570b3962ce1b ]--- [88215.399139] Kernel panic - not syncing: Oops: Fatal exception in interrupt [88215.399743] SMP: stopping secondary CPUs [88215.400093] Kernel Offset: disabled [88215.400402] CPU features: 0x0240022,6100200c [88215.400777] Memory Limit: none [88215.401056] Rebooting in 90 seconds.. 1 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.