Chiefahol Posted January 20, 2021 Posted January 20, 2021 Hi guys, My Helios64 keeps going offline, it has this red light come on the front panel but i am not sure what it means exactly. This happens once every few weeks, the system is fine after a reboot. What does this red light mean exactly? Is there anything i can do to prevent these breakdowns? Cheers, 0 Quote
clostro Posted January 20, 2021 Posted January 20, 2021 Just had the same thing happen to my unit. It was the first time I haven't manually rebooted the system for some reason for about 10 days I think. 0 Quote
snakekick Posted January 20, 2021 Posted January 20, 2021 my helios have the same problem after 5-7 days. not useable for a real nas ;( 0 Quote
ShadowDance Posted January 20, 2021 Posted January 20, 2021 It's probably indicating a kernel panic. You could try limiting CPU speed and setting the governor as suggested in the linked topic. You could also try hooking up the serial port to another machine and monitor it for any output indicating why the crash happened. 2 Quote
silver0480 Posted January 21, 2021 Posted January 21, 2021 @Chiefahol @clostro @snakekick Check dmsg (via serial console) at the moment of RED-Blinking. If kernel msg contains " [nnnnn.nnnnn] Unable to handle kernel paging request at virtual address " Try CPU speed limitation. But in my case , NOTHING solved with CPU speed. kernel msg contains "[nnnnn.nnnnn] Insufficient stack space to handle exception!" Then I tried Roll back to kernel 5.8.14 ( via armbian-config ) , got stable operation. on my helios64 , Any 5.9.x / 5.10.x kernels causes same critical issue ( including 5.10.9 with armbian nightly build ) 0 Quote
clostro Posted January 23, 2021 Posted January 23, 2021 Just had another one right now. Previous one was 3 days ago. So I plugged the serial cable but the device was completely gone. No login, or any output. Also there are no logs or anything about previous boot once rebooted either. journalctl --list-boots lists only the current boot, and dmesg -E is empty. Anyhow, trying the CPU speed thing now. I hope it works, cleaning the entire SSD cache that gets dirty at every crash is worrying me. 0 Quote
ShadowDance Posted January 23, 2021 Posted January 23, 2021 I had this for the second time now in a week, captured the kernel panic this time via serial so thought I'd post it. (Note I only have governor set to performance, but have not limited CPU speed to lower.) [895638.308515] Unable to handle kernel paging request at virtual address fff70000f77d1ab8 [895638.309217] Mem abort info: [895638.309470] ESR = 0x96000004 [895638.309745] EC = 0x25: DABT (current EL), IL = 32 bits [895638.310216] SET = 0, FnV = 0 [895638.310491] EA = 0, S1PTW = 0 [895638.310773] Data abort info: [895638.311033] ISV = 0, ISS = 0x00000004 [895638.311375] CM = 0, WnR = 0 [895638.311642] [fff70000f77d1ab8] address between user and kernel address ranges [895638.312271] Internal error: Oops: 96000004 [#1] PREEMPT SMP [895638.312765] Modules linked in: tcp_diag inet_diag veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ ipv4 nf_tables nfnetlink br_netfilter bridge governor_performance rfkill ledtrig_netdev snd_soc_hdmi_codec hantro_vpu(C) rockchip_vdec(C) rockchipdrm dw_mipi_dsi v4l2_h264 rockchip_rga dw_hdmi videobuf2_dma_sg videobuf2_dma_contig snd_soc _rockchip_i2s analogix_dp videobuf2_vmalloc drm_kms_helper v4l2_mem2mem videobuf2_memops snd_soc_core videobuf2_v4l2 r8152 videobuf2_common videodev cec snd_pcm_dmaengine panfrost snd_pcm rc_core gpu_sched fusb302 drm mc snd_timer leds_pw m snd tcpm typec gpio_charger soundcore sg drm_panel_orientation_quirks gpio_beeper cpufreq_dt nfsd auth_rpcgss nfs_acl lockd grace sunrpc lm75 ip_tables x_tables autofs4 zfs(POE) zunicode(POE) zavl(POE) icp(POE) zlua(POE) zcommon(POE) zn vpair(POE) spl(OE) algif_skcipher af_alg dm_crypt dm_mod [895638.312835] realtek dwmac_rk stmmac_platform stmmac mdio_xpcs adc_keys pwm_fan [895638.321067] CPU: 5 PID: 0 Comm: swapper/5 Tainted: P C OE 5.9.14-rockchip64 #20.11.4 [895638.321837] Hardware name: Helios64 (DT) [895638.322190] pstate: 80000085 (Nzcv daIf -PAN -UAO BTYPE=--) [895638.322690] pc : perf_event_task_tick+0x8c/0x330 [895638.323101] lr : perf_event_task_tick+0x84/0x330 [895638.323511] sp : ffff800011b03d30 [895638.323809] x29: ffff800011b03d30 x28: 00000000003d0900 [895638.324281] x27: 0000000000000000 x26: ffff0000f77c4aa0 [895638.324753] x25: ffff800011839988 x24: ffff80001150f018 [895638.325224] x23: ffff0000f6eae580 x22: ffff800011524c40 [895638.325695] x21: ffff0000f77d1ab8 x20: ffff800011521aa0 [895638.326166] x19: ffff800011521a00 x18: 0000000000000000 [895638.326637] x17: 0000000000000000 x16: 0000000000000000 [895638.327108] x15: 0000000000000000 x14: 0000000000000000 [895638.327579] x13: 0000000000000000 x12: 000000000000026b [895638.328050] x11: 0000000000000000 x10: 0000000000000000 [895638.328521] x9 : 0000000000000004 x8 : 000000000000026b [895638.328993] x7 : 0000000000000000 x6 : ffff0000f77c8740 [895638.329464] x5 : 000000000000126b x4 : ffff8000e62a3000 [895638.329935] x3 : 0000000000010001 x2 : ffff80001150f018 [895638.330406] x1 : ffff80001122bf80 x0 : 0000000000000005 [895638.330876] Call trace: [895638.331099] perf_event_task_tick+0x8c/0x330 [895638.331482] scheduler_tick+0xc4/0x140 [895638.331821] update_process_times+0x5c/0x70 [895638.332195] tick_sched_handle.isra.19+0x40/0x58 [895638.332605] tick_sched_timer+0x58/0xb0 [895638.332948] __hrtimer_run_queues+0x104/0x3b0 [895638.333337] hrtimer_interrupt+0xf4/0x250 [895638.333697] arch_timer_handler_phys+0x30/0x40 [895638.334093] handle_percpu_devid_irq+0xa0/0x2b8 [895638.334497] generic_handle_irq+0x30/0x48 [895638.334855] __handle_domain_irq+0x94/0x108 [895638.335230] gic_handle_irq+0xc8/0x170 [895638.335566] el1_irq+0xb8/0x180 [895638.335849] arch_cpu_idle+0x14/0x20 [895638.336171] do_idle+0x1fc/0x270 [895638.336462] cpu_startup_entry+0x24/0x60 [895638.336815] secondary_start_kernel+0x16c/0x180 [895638.337221] Code: b821681f 94303ac2 f8756a95 eb15035f (f85d06b6) [895638.337767] ---[ end trace f36c5bd12d8d7b39 ]--- [895638.338178] Kernel panic - not syncing: Fatal exception in interrupt [895638.338741] SMP: stopping secondary CPUs [895638.339095] Kernel Offset: disabled [895638.339408] CPU features: 0x0240022,2000200c [895638.339789] Memory Limit: none [895638.340071] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- 0 Quote
aprayoga Posted February 1, 2021 Posted February 1, 2021 (edited) Hi @snakekick, @clostro, @silver0480, @ShadowDance could you add following lines to the beginning of /boot/boot.cmd regulator dev vdd_log regulator value 930000 regulator dev vdd_center regulator value 950000 run mkimage -C none -A arm -T script -d /boot/boot.cmd /boot/boot.scr and reboot. Verify whether it can improve stability. Spoiler modified boot.cmd # DO NOT EDIT THIS FILE # # Please edit /boot/armbianEnv.txt to set supported parameters # regulator dev vdd_log regulator value 930000 regulator dev vdd_center regulator value 950000 setenv load_addr "0x9000000" setenv overlay_error "false" # default values setenv rootdev "/dev/mmcblk0p1" setenv verbosity "1" setenv console "both" setenv bootlogo "false" setenv rootfstype "ext4" setenv docker_optimizations "on" setenv earlycon "off" echo "Boot script loaded from ${devtype} ${devnum}" if test -e ${devtype} ${devnum} ${prefix}armbianEnv.txt; then load ${devtype} ${devnum} ${load_addr} ${prefix}armbianEnv.txt env import -t ${load_addr} ${filesize} fi if test "${logo}" = "disabled"; then setenv logo "logo.nologo"; fi if test "${console}" = "display" || test "${console}" = "both"; then setenv consoleargs "console=tty1"; fi if test "${console}" = "serial" || test "${console}" = "both"; then setenv consoleargs "console=ttyS2,1500000 ${consoleargs}"; fi if test "${earlycon}" = "on"; then setenv consoleargs "earlycon ${consoleargs}"; fi if test "${bootlogo}" = "true"; then setenv consoleargs "bootsplash.bootfile=bootsplash.armbian ${consoleargs}"; fi # get PARTUUID of first partition on SD/eMMC the boot script was loaded from if test "${devtype}" = "mmc"; then part uuid mmc ${devnum}:1 partuuid; fi setenv bootargs "root=${rootdev} rootwait rootfstype=${rootfstype} ${consoleargs} consoleblank=0 loglevel=${verbosity} ubootpart=${partuuid} usb-storage.quirks=${usbstoragequirks} ${extraargs} ${extraboardargs}" if test "${docker_optimizations}" = "on"; then setenv bootargs "${bootargs} cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory swapaccount=1"; fi load ${devtype} ${devnum} ${ramdisk_addr_r} ${prefix}uInitrd load ${devtype} ${devnum} ${kernel_addr_r} ${prefix}Image load ${devtype} ${devnum} ${fdt_addr_r} ${prefix}dtb/${fdtfile} fdt addr ${fdt_addr_r} fdt resize 65536 for overlay_file in ${overlays}; do if load ${devtype} ${devnum} ${load_addr} ${prefix}dtb/rockchip/overlay/${overlay_prefix}-${overlay_file}.dtbo; then echo "Applying kernel provided DT overlay ${overlay_prefix}-${overlay_file}.dtbo" fdt apply ${load_addr} || setenv overlay_error "true" fi done for overlay_file in ${user_overlays}; do if load ${devtype} ${devnum} ${load_addr} ${prefix}overlay-user/${overlay_file}.dtbo; then echo "Applying user provided DT overlay ${overlay_file}.dtbo" fdt apply ${load_addr} || setenv overlay_error "true" fi done if test "${overlay_error}" = "true"; then echo "Error applying DT overlays, restoring original DT" load ${devtype} ${devnum} ${fdt_addr_r} ${prefix}dtb/${fdtfile} else if load ${devtype} ${devnum} ${load_addr} ${prefix}dtb/rockchip/overlay/${overlay_prefix}-fixup.scr; then echo "Applying kernel provided DT fixup script (${overlay_prefix}-fixup.scr)" source ${load_addr} fi if test -e ${devtype} ${devnum} ${prefix}fixup.scr; then load ${devtype} ${devnum} ${load_addr} ${prefix}fixup.scr echo "Applying user provided fixup script (fixup.scr)" source ${load_addr} fi fi booti ${kernel_addr_r} ${ramdisk_addr_r} ${fdt_addr_r} # Recompile with: # mkimage -C none -A arm -T script -d /boot/boot.cmd /boot/boot.scr You could also use the attached boot.scr boot.scr Edited February 1, 2021 by aprayoga Added boot.scr 1 Quote
clostro Posted February 1, 2021 Posted February 1, 2021 Hi @aprayoga I have a question before I try modifying boot.scr - I tried @gprovost's suggestion about the CPU speeds and the device has been running stable for nearly 9 days now. I was waiting for 14 days to test its stability and then update Armbian. The current Armbian running on the device is Linux helios64 5.9.14-rockchip64 #20.11.4 SMP PREEMPT Tue Dec 15 08:52:20 CET 2020 aarch64 GNU/Linux. But I have been meaning to update to Armbian_20.11.10_Helios64_buster_current_5.9.14.img.xz. - Shall I try this mod on top of the current setup with the old Armbian and CPU speed mod? Or can I update to the newest image? - If I update to the latest image, can I just update in place or do you suggest a fresh install? - Also shall I also redo the CPU speed mod as well? After a fresh install. Thanks 0 Quote
tionebrr Posted February 1, 2021 Posted February 1, 2021 Thanks @aprayoga I'm testing this right now with governor 'ondemand' and throttling the full frequency range. Edit 7h later: Just crashed. Going back to 1800MHz locked. 0 Quote
snakekick Posted February 1, 2021 Posted February 1, 2021 @aprayoga as i found in google this settings change the cpu voltage ? is this correct? a misstake with this settings can damage the hardware so one or two words of explation can be helpful 0 Quote
gprovost Posted February 2, 2021 Posted February 2, 2021 @tionebrr With CPU governor set as performance and frequency set 1.8GHz you have a fully stable system or you still encountered crash ? 0 Quote
aprayoga Posted February 2, 2021 Posted February 2, 2021 VDD_LOG powers internal logic circuitry such as SRAM, clock & reset Unit, PLL, internal MCU. VDD_CENTER powers mostly vpu, encoder/decoder but it also powers DDR controller. Recommended operating condition for those supply is 0.9V but we have found other RK3399 board, RockPro64, increase this to 0.95V to improve stability https://github.com/u-boot/u-boot/commit/f210cbc1f3d4f84baa595d83933369b4d6a529ea https://github.com/u-boot/u-boot/commit/5a6d960b222203e60ab57e19b3eb7b41c24b564b 0 Quote
tionebrr Posted February 2, 2021 Posted February 2, 2021 @gprovost It still crashes but only after a lot more time (days). It may be a separate issue because those crashes are not ending in a blinking red led. I even suspect that the system is still partially up when it happens, but didn't verified. I've found an old RPi 1 in a drawer yesterday, so I will be able to tell more the next time it happens. 1 Quote
silver0480 Posted February 4, 2021 Posted February 4, 2021 @aprayoga Tested vdd via new system ( using Nightly build ). My issue ( karnel panic by "sysctl -w vm.drop_caches=3" ) isn't solved. ( https://forum.armbian.com/topic/16664-system-crashes-after-sync-command/?do=findComment&comment=117238 ) 0 Quote
clostro Posted February 7, 2021 Posted February 7, 2021 Just wanted to report that the CPU frequency mod has been running stable under normal use for 15 days now (on 1Gbe connection). Haven't tried the voltage mod. I'll switch to the February 4th Buster 5.10 build soon. edit: 23 days, and I shut it down for sd card backup and system update. cpu freq mod is rock solid. 2 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.