Mystery red light on helios64


Chiefahol
 Share

3 3

Recommended Posts

Hi guys,

 

My Helios64 keeps going offline, it has this red light come on the front panel but i am not sure what it means exactly.

agk287Y.jpg

This happens once every few weeks, the system is fine after a reboot.

What does this red light mean exactly? Is there anything i can do to prevent these breakdowns?

Cheers,

Link to post
Share on other sites

Armbian is a community driven open source project. Do you like to contribute your code?

It's probably indicating a kernel panic. You could try limiting CPU speed and setting the governor as suggested in the linked topic. You could also try hooking up the serial port to another machine and monitor it for any output indicating why the crash happened.

 

Link to post
Share on other sites

@Chiefahol @clostro @snakekick 

Check dmsg (via serial console) at the moment of RED-Blinking.

 

If kernel msg contains " [nnnnn.nnnnn] Unable to handle kernel paging request at virtual address "

Try CPU speed limitation.

 

But in my case , NOTHING solved with CPU speed.

kernel msg contains "[nnnnn.nnnnn] Insufficient stack space to handle exception!"

Then I tried Roll back to kernel 5.8.14 ( via armbian-config ) , got stable operation.

 

 

on my helios64 , Any 5.9.x / 5.10.x kernels causes same critical issue ( including 5.10.9 with armbian nightly build )

Link to post
Share on other sites

Just had another one right now. Previous one was 3 days ago.

So I plugged the serial cable but the device was completely gone. No login, or any output. 

Also there are no logs or anything about previous boot once rebooted either.

journalctl --list-boots lists only the current boot, and dmesg -E is empty.

 

Anyhow, trying the CPU speed thing now. I hope it works, cleaning the entire SSD cache that gets dirty at every crash is worrying me.

Link to post
Share on other sites

I had this for the second time now in a week, captured the kernel panic this time via serial so thought I'd post it. (Note I only have governor set to performance, but have not limited CPU speed to lower.)

[895638.308515] Unable to handle kernel paging request at virtual address fff70000f77d1ab8
[895638.309217] Mem abort info:
[895638.309470]   ESR = 0x96000004
[895638.309745]   EC = 0x25: DABT (current EL), IL = 32 bits
[895638.310216]   SET = 0, FnV = 0
[895638.310491]   EA = 0, S1PTW = 0
[895638.310773] Data abort info:
[895638.311033]   ISV = 0, ISS = 0x00000004
[895638.311375]   CM = 0, WnR = 0
[895638.311642] [fff70000f77d1ab8] address between user and kernel address ranges
[895638.312271] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[895638.312765] Modules linked in: tcp_diag inet_diag veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_
ipv4 nf_tables nfnetlink br_netfilter bridge governor_performance rfkill ledtrig_netdev snd_soc_hdmi_codec hantro_vpu(C) rockchip_vdec(C) rockchipdrm dw_mipi_dsi v4l2_h264 rockchip_rga dw_hdmi videobuf2_dma_sg videobuf2_dma_contig snd_soc
_rockchip_i2s analogix_dp videobuf2_vmalloc drm_kms_helper v4l2_mem2mem videobuf2_memops snd_soc_core videobuf2_v4l2 r8152 videobuf2_common videodev cec snd_pcm_dmaengine panfrost snd_pcm rc_core gpu_sched fusb302 drm mc snd_timer leds_pw
m snd tcpm typec gpio_charger soundcore sg drm_panel_orientation_quirks gpio_beeper cpufreq_dt nfsd auth_rpcgss nfs_acl lockd grace sunrpc lm75 ip_tables x_tables autofs4 zfs(POE) zunicode(POE) zavl(POE) icp(POE) zlua(POE) zcommon(POE) zn
vpair(POE) spl(OE) algif_skcipher af_alg dm_crypt dm_mod
[895638.312835]  realtek dwmac_rk stmmac_platform stmmac mdio_xpcs adc_keys pwm_fan
[895638.321067] CPU: 5 PID: 0 Comm: swapper/5 Tainted: P         C OE     5.9.14-rockchip64 #20.11.4
[895638.321837] Hardware name: Helios64 (DT)
[895638.322190] pstate: 80000085 (Nzcv daIf -PAN -UAO BTYPE=--)
[895638.322690] pc : perf_event_task_tick+0x8c/0x330
[895638.323101] lr : perf_event_task_tick+0x84/0x330
[895638.323511] sp : ffff800011b03d30
[895638.323809] x29: ffff800011b03d30 x28: 00000000003d0900
[895638.324281] x27: 0000000000000000 x26: ffff0000f77c4aa0
[895638.324753] x25: ffff800011839988 x24: ffff80001150f018
[895638.325224] x23: ffff0000f6eae580 x22: ffff800011524c40
[895638.325695] x21: ffff0000f77d1ab8 x20: ffff800011521aa0
[895638.326166] x19: ffff800011521a00 x18: 0000000000000000
[895638.326637] x17: 0000000000000000 x16: 0000000000000000
[895638.327108] x15: 0000000000000000 x14: 0000000000000000
[895638.327579] x13: 0000000000000000 x12: 000000000000026b
[895638.328050] x11: 0000000000000000 x10: 0000000000000000
[895638.328521] x9 : 0000000000000004 x8 : 000000000000026b
[895638.328993] x7 : 0000000000000000 x6 : ffff0000f77c8740
[895638.329464] x5 : 000000000000126b x4 : ffff8000e62a3000
[895638.329935] x3 : 0000000000010001 x2 : ffff80001150f018
[895638.330406] x1 : ffff80001122bf80 x0 : 0000000000000005
[895638.330876] Call trace:
[895638.331099]  perf_event_task_tick+0x8c/0x330
[895638.331482]  scheduler_tick+0xc4/0x140
[895638.331821]  update_process_times+0x5c/0x70
[895638.332195]  tick_sched_handle.isra.19+0x40/0x58
[895638.332605]  tick_sched_timer+0x58/0xb0
[895638.332948]  __hrtimer_run_queues+0x104/0x3b0
[895638.333337]  hrtimer_interrupt+0xf4/0x250
[895638.333697]  arch_timer_handler_phys+0x30/0x40
[895638.334093]  handle_percpu_devid_irq+0xa0/0x2b8
[895638.334497]  generic_handle_irq+0x30/0x48
[895638.334855]  __handle_domain_irq+0x94/0x108
[895638.335230]  gic_handle_irq+0xc8/0x170
[895638.335566]  el1_irq+0xb8/0x180
[895638.335849]  arch_cpu_idle+0x14/0x20
[895638.336171]  do_idle+0x1fc/0x270
[895638.336462]  cpu_startup_entry+0x24/0x60
[895638.336815]  secondary_start_kernel+0x16c/0x180
[895638.337221] Code: b821681f 94303ac2 f8756a95 eb15035f (f85d06b6)
[895638.337767] ---[ end trace f36c5bd12d8d7b39 ]---
[895638.338178] Kernel panic - not syncing: Fatal exception in interrupt
[895638.338741] SMP: stopping secondary CPUs
[895638.339095] Kernel Offset: disabled
[895638.339408] CPU features: 0x0240022,2000200c
[895638.339789] Memory Limit: none
[895638.340071] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

 

Link to post
Share on other sites

Hi @snakekick, @clostro, @silver0480, @ShadowDance

could you add following lines to the beginning of /boot/boot.cmd

regulator dev vdd_log
regulator value 930000
regulator dev vdd_center
regulator value 950000

run

mkimage -C none -A arm -T script -d /boot/boot.cmd /boot/boot.scr

and reboot. Verify whether it can improve stability.

 

 

Spoiler

modified boot.cmd



# DO NOT EDIT THIS FILE
#
# Please edit /boot/armbianEnv.txt to set supported parameters
#
regulator dev vdd_log
regulator value 930000
regulator dev vdd_center
regulator value 950000

setenv load_addr "0x9000000"
setenv overlay_error "false"
# default values
setenv rootdev "/dev/mmcblk0p1"
setenv verbosity "1"
setenv console "both"
setenv bootlogo "false"
setenv rootfstype "ext4"
setenv docker_optimizations "on"
setenv earlycon "off"

echo "Boot script loaded from ${devtype} ${devnum}"

if test -e ${devtype} ${devnum} ${prefix}armbianEnv.txt; then
        load ${devtype} ${devnum} ${load_addr} ${prefix}armbianEnv.txt
        env import -t ${load_addr} ${filesize}
fi

if test "${logo}" = "disabled"; then setenv logo "logo.nologo"; fi

if test "${console}" = "display" || test "${console}" = "both"; then setenv consoleargs "console=tty1"; fi
if test "${console}" = "serial" || test "${console}" = "both"; then setenv consoleargs "console=ttyS2,1500000 ${consoleargs}"; fi
if test "${earlycon}" = "on"; then setenv consoleargs "earlycon ${consoleargs}"; fi
if test "${bootlogo}" = "true"; then setenv consoleargs "bootsplash.bootfile=bootsplash.armbian ${consoleargs}"; fi

# get PARTUUID of first partition on SD/eMMC the boot script was loaded from
if test "${devtype}" = "mmc"; then part uuid mmc ${devnum}:1 partuuid; fi

setenv bootargs "root=${rootdev} rootwait rootfstype=${rootfstype} ${consoleargs} consoleblank=0 loglevel=${verbosity} ubootpart=${partuuid} usb-storage.quirks=${usbstoragequirks} ${extraargs} ${extraboardargs}"

if test "${docker_optimizations}" = "on"; then setenv bootargs "${bootargs} cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory swapaccount=1"; fi

load ${devtype} ${devnum} ${ramdisk_addr_r} ${prefix}uInitrd
load ${devtype} ${devnum} ${kernel_addr_r} ${prefix}Image

load ${devtype} ${devnum} ${fdt_addr_r} ${prefix}dtb/${fdtfile}
fdt addr ${fdt_addr_r}
fdt resize 65536
for overlay_file in ${overlays}; do
        if load ${devtype} ${devnum} ${load_addr} ${prefix}dtb/rockchip/overlay/${overlay_prefix}-${overlay_file}.dtbo; then
                echo "Applying kernel provided DT overlay ${overlay_prefix}-${overlay_file}.dtbo"
                fdt apply ${load_addr} || setenv overlay_error "true"
        fi
done
for overlay_file in ${user_overlays}; do
        if load ${devtype} ${devnum} ${load_addr} ${prefix}overlay-user/${overlay_file}.dtbo; then
                echo "Applying user provided DT overlay ${overlay_file}.dtbo"
                fdt apply ${load_addr} || setenv overlay_error "true"
        fi
done
if test "${overlay_error}" = "true"; then
        echo "Error applying DT overlays, restoring original DT"
        load ${devtype} ${devnum} ${fdt_addr_r} ${prefix}dtb/${fdtfile}
else
        if load ${devtype} ${devnum} ${load_addr} ${prefix}dtb/rockchip/overlay/${overlay_prefix}-fixup.scr; then
                echo "Applying kernel provided DT fixup script (${overlay_prefix}-fixup.scr)"
                source ${load_addr}
        fi
        if test -e ${devtype} ${devnum} ${prefix}fixup.scr; then
                load ${devtype} ${devnum} ${load_addr} ${prefix}fixup.scr
                echo "Applying user provided fixup script (fixup.scr)"
                source ${load_addr}
        fi
fi
booti ${kernel_addr_r} ${ramdisk_addr_r} ${fdt_addr_r}

# Recompile with:
# mkimage -C none -A arm -T script -d /boot/boot.cmd /boot/boot.scr

 

 

You could also use the attached boot.scr

 

 

boot.scr

Edited by aprayoga
Added boot.scr
Link to post
Share on other sites

Hi @aprayoga 

I have a question before I try modifying boot.scr -

 

I tried @gprovost's suggestion about the CPU speeds and the device has been running stable for nearly 9 days now.  I was waiting for 14 days to test its stability and then update Armbian.

 

The current Armbian running on the device is Linux helios64 5.9.14-rockchip64 #20.11.4 SMP PREEMPT Tue Dec 15 08:52:20 CET 2020 aarch64 GNU/Linux. But I have been meaning to update to Armbian_20.11.10_Helios64_buster_current_5.9.14.img.xz.

 

- Shall I try this mod on top of the current setup with the old Armbian and CPU speed mod? Or can I update to the newest image?

 

- If I update to the latest image, can I just update in place or do you suggest a fresh install?

 

- Also shall I also redo the CPU speed mod as well? After a fresh install.

 

Thanks

Link to post
Share on other sites

VDD_LOG powers internal logic circuitry such as SRAM, clock & reset Unit, PLL, internal MCU.

VDD_CENTER powers mostly vpu, encoder/decoder but it also powers DDR controller.

 

Recommended operating condition for those supply is 0.9V

vdd_log_center_spec.png.e37e17abaa5d54a09f87d4fdfc30c1f9.png

but we have found other RK3399 board, RockPro64, increase this to 0.95V to improve stability

https://github.com/u-boot/u-boot/commit/f210cbc1f3d4f84baa595d83933369b4d6a529ea

https://github.com/u-boot/u-boot/commit/5a6d960b222203e60ab57e19b3eb7b41c24b564b

 

Link to post
Share on other sites

@gprovost It still crashes but only after a lot more time (days). It may be a separate issue because those crashes are not ending in a blinking red led. I even suspect that the system is still partially up when it happens, but didn't verified. I've found an old RPi 1 in a drawer yesterday, so I will be able to tell more the next time it happens.

Link to post
Share on other sites

Just wanted to report that the CPU frequency mod has been running stable under normal use for 15 days now (on 1Gbe connection). Haven't tried the voltage mod.

 

I'll switch to the February 4th Buster 5.10 build soon.

 

 

edit: 23 days, and I shut it down for sd card backup and system update. cpu freq mod is rock solid.

Link to post
Share on other sites

 Share

3 3