dieKatze88

  • Posts

    12
  • Joined

  • Last visited

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

dieKatze88's Achievements

  1. It really won't if properly designed. Server grade hardware has been doing this for years, they just cut holes in the PCB to allow for airflow around the drive.
  2. My system is still unstable and I have no idea what to do. I've given every suggestion I've seen on the internet a try and I think the only thing I can do is move on.
  3. I did manage to catch the output for the 3rd crash yesterday. [22793.372295] Internal error: Oops: 96000004 [#1] PREEMPT SMP [22793.372795] Modules linked in: governor_performance rfkill zram snd_soc_hdmi_ codec r8152 leds_pwm gpio_charger pwm_fan snd_soc_rockchip_i2s snd_soc_core snd_ pcm_dmaengine hantro_vpu(C) snd_pcm rockchip_vdec(C) rockchip_rga snd_timer vide obuf2_dma_sg v4l2_h264 videobuf2_dma_contig videobuf2_vmalloc panfrost v4l2_mem2 mem gpu_sched videobuf2_memops snd videobuf2_v4l2 videobuf2_common fusb302 sound core tcpm rockchipdrm videodev typec mc dw_mipi_dsi dw_hdmi analogix_dp drm_kms_ helper cec sg rc_core drm drm_panel_orientation_quirks gpio_beeper cpufreq_dt le dtrig_netdev lm75 ip_tables x_tables autofs4 raid10 raid1 raid0 multipath linear dm_mirror dm_region_hash dm_log raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx realtek dm_mod md_mod dwmac_rk stmmac_platform stmmac pcs_xp cs adc_keys [22793.379068] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G C 5.10.21 -rockchip64 #21.02.3 [22793.379844] Hardware name: Helios64 (DT) [22793.380191] pstate: 40000085 (nZcv daIf -PAN -UAO -TCO BTYPE=--) [22793.380728] pc : rcu_sched_clock_irq+0x208/0xce0 [22793.381134] lr : rcu_sched_clock_irq+0x1f8/0xce0 [22793.381539] sp : ffff800011c13cd0 [22793.381832] x29: ffff800011c13cd0 x28: ffff800011952440 [22793.382301] x27: ffff8000118ba000 x26: ffff0000f77c8980 [22793.382769] x25: ffff800011580980 x24: ffff8000e6248000 [22793.383237] x23: 0000000000000000 x22: ffff8000118b9948 [22793.383705] x21: ffff800011b27ad8 x20: ffff0000f77c89f0 [22793.384173] x19: 0000000000000001 x18: 0000000000000000 [22793.384641] x17: 0000000000000000 x16: 0000000000000000 [22793.385109] x15: 0000002d01e1f6ac x14: 000000000000006a [22793.385577] x13: 000000010055ce03 x12: 00000000000ab681 [22793.386045] x11: ffff8000118b7000 x10: ffff80001194ef28 [22793.386513] x9 : ffff80001194ef20 x8 : ffff800011b72320 [22793.386981] x7 : ffff800011952000 x6 : 0000007f7ced25ad [22793.387449] x5 : 7ab3901a5062db37 x4 : ffff8000e6248000 [22793.387917] x3 : 0000000000010001 x2 : ffff8000e6248000 [22793.388385] x1 : ffff0000f77c89f0 x0 : fffe800011952440 [22793.388854] Call trace: [22793.389076] rcu_sched_clock_irq+0x208/0xce0 [22793.389454] update_process_times+0x60/0xa0 [22793.389825] tick_sched_handle.isra.19+0x40/0x58 [22793.390231] tick_sched_timer+0x58/0xb0 [22793.390572] __hrtimer_run_queues+0x104/0x388 [22793.390956] hrtimer_interrupt+0xf4/0x250 [22793.391311] arch_timer_handler_phys+0x30/0x40 [22793.391704] handle_percpu_devid_irq+0xa0/0x298 [22793.392103] generic_handle_irq+0x30/0x48 [22793.392456] __handle_domain_irq+0x94/0x108 [22793.392827] gic_handle_irq+0xc0/0x140 [22793.393159] el1_irq+0xc0/0x180 [22793.393440] arch_cpu_idle+0x18/0x28 [22793.393757] default_idle_call+0x44/0x1bc [22793.394111] do_idle+0x204/0x278 [22793.394397] cpu_startup_entry+0x24/0x60 [22793.394745] secondary_start_kernel+0x170/0x180 [22793.395147] Code: 72001c1f 54fffda1 34fffcd3 f94033e0 (f9400401) [22793.395690] ---[ end trace a14f0598db2feff1 ]--- [22793.396097] Kernel panic - not syncing: Oops: Fatal exception in interrupt [22793.396700] SMP: stopping secondary CPUs [22793.397053] Kernel Offset: disabled [22793.397361] CPU features: 0x0240022,6100200c [22793.397736] Memory Limit: none [22793.398014] ---[ end Kernel panic - not syncing: Oops: Fatal exception in int errupt ]---
  4. It didn't stay as stable as we thought. Unfortunately the serial console failed at some point. I'll reconnect with it and try to keep it up again to see if I can catch it crashing again. At least it lasted about 30 hours this time.
  5. I've gone ahead and reinstalled to the internal flash, setup with a more minimal system (Not using OMV) and am monitoring it for failures. Any reason why some units are only stable at 1.2ghz?
  6. OK After 13 hours we're still up (Even with a light load of sending massive pings on the 2.5g interface to my desktop) I'm going to give it one more day before I call it good and try reinstalling to the internal flash again.
  7. I have set those settings tonight and we'll see if it crashes by morning.
  8. Honestly this is fine and was what I was hoping for. Makes it a lot easier to account for different board revisions/hardware changes.
  9. It crashed again last night. I Have two blocks for you. Both of the Kernel Panics posted here have been from a SD boot with the emmc wiped. [21004.776415] Internal error: Oops: 96000004 [#1] PREEMPT SMP [21004.776915] Modules linked in: rfkill governor_performance snd_soc_hdmi_codec r8152 hantro_vpu(C) rockchip_vdec(C) rockchip_rga v4l2_h264 videobuf2_dma_contig v4l2_mem2mem snd_soc_rockchip_i2s videobuf2_dma_sg videobuf2_vmalloc panfrost videobuf2_memops rockchipdrm dw_mipi_dsi dw_hdmi leds_pwm analogix_dp pwm_fan snd_soc_core gpu_sched videobuf2_v4l2 gpio_charger videobuf2_common snd_pcm_dmaengine snd_pcm drm_kms_helper fusb302 snd_timer cec tcpm snd videodev rc_core soundcore typec mc drm drm_panel_orientation_quirks gpio_beeper cpufreq_dt ledtrig_netdev lm75 ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear md_mod realtek dwmac_rk stmmac_platform stmmac pcs_xpcs adc_keys [21004.782817] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G C 5.10.21 -rockchip64 #21.02.3 [21004.783592] Hardware name: Helios64 (DT) [21004.783938] pstate: 40000085 (nZcv daIf -PAN -UAO -TCO BTYPE=--) [21004.784476] pc : rcu_sched_clock_irq+0x208/0xce0 [21004.784883] lr : rcu_sched_clock_irq+0x1f8/0xce0 [21004.785288] sp : ffff800011c13cd0 [21004.785580] x29: ffff800011c13cd0 x28: ffff800011952440 [21004.786049] x27: ffff8000118ba000 x26: ffff0000f77c8980 [21004.786518] x25: ffff800011580980 x24: ffff8000e6248000 [21004.786986] x23: 0000000000000000 x22: ffff8000118b9948 [21004.787454] x21: ffff800011b27ad8 x20: ffff0000f77c89f0 [21004.787921] x19: 0000000000000001 x18: 0000000000000000 [21004.788390] x17: 0000000000000000 x16: 0000000000000000 [21004.788858] x15: 0000000000000001 x14: 00000000000002d8 [21004.789326] x13: 00000001004efb7a x12: 000000000010e229 [21004.789794] x11: ffff8000118b7000 x10: ffff80001194ef28 [21004.790262] x9 : ffff80001194ef20 x8 : ffff800011b72320 [21004.790730] x7 : ffff800011952000 x6 : 000000757b1fbc62 [21004.791197] x5 : d29eb8946b701b4f x4 : ffff8000e6248000 [21004.791665] x3 : 0000000000010001 x2 : ffff8000e6248000 [21004.792133] x1 : ffff0000f77c89f0 x0 : fffe800011952440 [21004.792602] Call trace: [21004.792822] rcu_sched_clock_irq+0x208/0xce0 [21004.793200] update_process_times+0x60/0xa0 [21004.793569] tick_sched_handle.isra.19+0x40/0x58 [21004.793974] tick_sched_timer+0x58/0xb0 [21004.794313] __hrtimer_run_queues+0x104/0x388 [21004.794697] hrtimer_interrupt+0xf4/0x250 [21004.795054] arch_timer_handler_phys+0x30/0x40 [21004.795447] handle_percpu_devid_irq+0xa0/0x298 [21004.795845] generic_handle_irq+0x30/0x48 [21004.796199] __handle_domain_irq+0x94/0x108 [21004.796570] gic_handle_irq+0xc0/0x140 [21004.796902] el1_irq+0xc0/0x180 [21004.797182] arch_cpu_idle+0x18/0x28 [21004.797498] default_idle_call+0x44/0x1bc [21004.797853] do_idle+0x204/0x278 [21004.798138] cpu_startup_entry+0x24/0x60 [21004.798486] secondary_start_kernel+0x170/0x180 [21004.798887] Code: 72001c1f 54fffda1 34fffcd3 f94033e0 (f9400401) [21004.799427] ---[ end trace 730e9802b6c79383 ]--- [21004.799833] Kernel panic - not syncing: Oops: Fatal exception in interrupt [21004.800436] SMP: stopping secondary CPUs [21004.800793] Kernel Offset: disabled [21004.801103] CPU features: 0x0240022,6100200c [21004.801477] Memory Limit: none [21004.801756] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]--- DDR Version 1.24 20191016 In channel 0 CS = 0 MR0=0x18 MR4=0x2 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 1 CS = 0 MR0=0x18 MR4=0x2 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 0 training pass! channel 1 training pass! change freq to 416MHz 0,1 Channel 0: LPDDR4,416MHz Bus Width=32 Col=10 Bank=8 Row=16 CS=1 Die Bus-Width=16 Size=2048MB Channel 1: LPDDR4,416MHz Bus Width=32 Col=10 Bank=8 Row=16 CS=1 Die Bus-Width=16 Size=2048MB 256B stride channel 0 CS = 0 MR0=0x18 MR4=0x2 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 1 CS = 0 MR0=0x18 MR4=0x2 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 0 training pass! channel 1 training pass! channel 0, cs 0, advanced training done channel 1, cs 0, advanced training done change freq to 856MHz 1,0 ch 0 ddrconfig = 0x101, ddrsize = 0x40 ch 1 ddrconfig = 0x101, ddrsize = 0x40 pmugrf_os_reg[2] = 0x32C1F2C1, stride = 0xD ddr_set_rate to 328MHZ ddr_set_rate to 666MHZ ddr_set_rate to 928MHZ channel 0, cs 0, advanced training done channel 1, cs 0, advanced training done ddr_set_rate to 416MHZ, ctl_index 0 ddr_set_rate to 856MHZ, ctl_index 1 support 416 856 328 666 928 MHz, current 856MHz OUT Boot1: 2019-03-14, version: 1.19 CPUId = 0x0 ChipType = 0x10, 254 SdmmcInit=2 0 BootCapSize=100000 UserCapSize=14910MB FwPartOffset=2000 , 100000 mmc0:cmd5,20 SdmmcInit=0 0 BootCapSize=0 UserCapSize=30436MB FwPartOffset=2000 , 0 StorageInit ok = 83460 SecureMode = 0 SecureInit read PBA: 0x4 SecureInit read PBA: 0x404 SecureInit read PBA: 0x804 SecureInit read PBA: 0xc04 SecureInit read PBA: 0x1004 SecureInit read PBA: 0x1404 SecureInit read PBA: 0x1804 SecureInit read PBA: 0x1c04 SecureInit ret = 0, SecureMode = 0 atags_set_bootdev: ret:(0) GPT 0x3380ec0 signature is wrong recovery gpt... GPT 0x3380ec0 signature is wrong recovery gpt fail! LoadTrust Addr:0x4000 No find bl30.bin No find bl32.bin Load uboot, ReadLba = 2000 Load OK, addr=0x200000, size=0xe5b60 RunBL31 0x40000 NOTICE: BL31: v1.3(debug):42583b6 NOTICE: BL31: Built : 07:55:13, Oct 15 2019 NOTICE: BL31: Rockchip release version: v1.1 INFO: GICv3 with legacy support detected. ARM GICV3 driver initialized in EL3 INFO: Using opteed sec cpu_context! INFO: boot cpu mask: 0 INFO: plat_rockchip_pmu_init(1190): pd status 3e INFO: BL31: Initializing runtime services WARNING: No OPTEE provided by BL2 boot loader, Booting device without OPTEE initialization. SMC`s destined for OPTEE will return SMC_UNK ERROR: Error initializing runtime service opteed_fast INFO: BL31: Preparing for EL3 exit to normal world INFO: Entry point address = 0x200000 INFO: SPSR = 0x3c9 U-Boot 2020.10-armbian (Mar 08 2021 - 14:54:58 +0000) SoC: Rockchip rk3399 Reset cause: POR DRAM: 3.9 GiB PMIC: RK808 SF: Detected w25q128 with page size 256 Bytes, erase size 4 KiB, total 16 MiB MMC: mmc@fe320000: 1, sdhci@fe330000: 0 Loading Environment from MMC... *** Warning - bad CRC, using default environment In: serial Out: serial Err: serial Model: Helios64 Revision: 1.2 - 4GB non ECC Net: eth0: ethernet@fe300000 scanning bus for devices... starting USB... Bus usb@fe380000: USB EHCI 1.00 Bus dwc3: usb maximum-speed not found Register 2000140 NbrPorts 2 Starting the controller USB XHCI 1.10 scanning bus usb@fe380000 for devices... 1 USB Device(s) found scanning bus dwc3 for devices... cannot reset port 4!? 4 USB Device(s) found scanning usb for storage devices... 0 Storage Device(s) found Hit any key to stop autoboot: 0 switch to partitions #0, OK mmc1 is current device Scanning mmc 1:1... Found U-Boot script /boot/boot.scr 3185 bytes read in 9 ms (344.7 KiB/s) ## Executing script at 00500000 Boot script loaded from mmc 1 166 bytes read in 12 ms (12.7 KiB/s) 13851809 bytes read in 606 ms (21.8 MiB/s) 28582400 bytes read in 1214 ms (22.5 MiB/s) 81913 bytes read in 16 ms (4.9 MiB/s) 2698 bytes read in 13 ms (202.1 KiB/s) Applying kernel provided DT fixup script (rockchip-fixup.scr) ## Executing script at 09000000 Moving Image from 0x2080000 to 0x2200000, end=3de0000 ## Loading init Ramdisk from Legacy Image at 06000000 ... Image Name: uInitrd Image Type: AArch64 Linux RAMDisk Image (gzip compressed) Data Size: 13851745 Bytes = 13.2 MiB Load Address: 00000000 Entry Point: 00000000 Verifying Checksum ... OK ## Flattened Device Tree blob at 01f00000 Booting using the fdt blob at 0x1f00000 Loading Ramdisk to f51b9000, end f5eeec61 ... OK Loading Device Tree to 00000000f513c000, end 00000000f51b8fff ... OK Starting kernel ...
  10. I have disabled zram, as it was suggested by someone on Reddit. I am now running the latest kernel, but absolutely no kernel in my history of this thing has been stable. I got the following serial console the last time it crashed (But could not edit my post due to limits): [10105.431800] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: rcu_sched_clock_irq+0x7a4/0xce0 [10105.432752] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G C 5.10.21-rockchip64 #21.02.3 [10105.433526] Hardware name: Helios64 (DT) [10105.433872] Call trace: [10105.434093] dump_backtrace+0x0/0x200 [10105.434418] show_stack+0x18/0x68 [10105.434714] dump_stack+0xcc/0x124 [10105.435016] panic+0x174/0x374 [10105.435288] __stack_chk_fail+0x3c/0x40 [10105.435626] rcu_sched_clock_irq+0x7a4/0xce0 [10105.436004] update_process_times+0x60/0xa0 [10105.436373] tick_sched_handle.isra.19+0x40/0x58 [10105.436778] tick_sched_timer+0x58/0xb0 [10105.437118] __hrtimer_run_queues+0x104/0x388 [10105.437502] hrtimer_interrupt+0xf4/0x250 [10105.437861] arch_timer_handler_phys+0x30/0x40 [10105.438258] handle_percpu_devid_irq+0xa0/0x298 [10105.438659] generic_handle_irq+0x30/0x48 [10105.439012] __handle_domain_irq+0x94/0x108 [10105.439384] gic_handle_irq+0xc0/0x140 [10105.439715] el1_irq+0xc0/0x180 [10105.439995] arch_cpu_idle+0x18/0x28 [10105.440310] default_idle_call+0x44/0x1bc [10105.440665] do_idle+0x204/0x278 [10105.440950] cpu_startup_entry+0x28/0x60 [10105.441298] secondary_start_kernel+0x170/0x180 [10105.441700] SMP: stopping secondary CPUs [10105.442057] Kernel Offset: disabled [10105.442365] CPU features: 0x0240022,6100200c [10105.442740] Memory Limit: none [10105.443021] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: rcu_sched_clock_irq+0x7a4/0xce0 ]--- root@helios64:~# uname -a Linux helios64 5.10.21-rockchip64 #21.02.3 SMP PREEMPT Mon Mar 8 01:05:08 UTC 2021 aarch64 GNU/Linux root@helios64:~# my armbian monitor: http://ix.io/2U0J
  11. I backed this early, and I have had nothing but stability problems since I built the thing. Sometimes my machine runs for as few as 6 minutes before crashing, and helpfully, it keeps clearing the systemd journal every time it starts up so I can't even see what happened just before it crashed. It crashes running OMV and Syncthing with high load, it crashes doing absolutely nothing but watching the systemd journal. It crashes doing nothing at all. When it crashes, it corrupts my files, and often the OMV Database requiring me to CONSTANTLY reset the GUI password for OMV, and then find that half of OMV isn't working. It does this on both uSD cards and on the inbuilt MMC. I'm nearing my end with this thing, how can I do a full hardware test on it in a way that will say "Yes this is working as expected" or "No this is defective."