Does anyone actually have a stable system?


tommitytom
 Share

6 6

Recommended Posts

Just a heads up that I reinstalled my Helios 64 with the latest armbian buster (currently running from SD) and it has been running solid as a rock for 7 days.  No longer using OMV and I don't really miss it

Link to post
Share on other sites

Donate and support the project!

Posted (edited)
On 3/25/2021 at 4:57 AM, gprovost said:

@SIGSEGV During boot up, the first messages output on the serial will show if it's U-boot TPL/SPL our Rockchip blob.

 

This is the output with U-boot TPL/SPL

 



U-Boot TPL 2020.10-armbian (Mar 14 2021 - 07:07:37)
Channel 0: LPDDR4, 50MHz
BW=32 Col=10 Bk=8 CS0 Row=16/15 CS=1 Die BW=16 Size=2048MB
Channel 1: LPDDR4, 50MHz
BW=32 Col=10 Bk=8 CS0 Row=16/15 CS=1 Die BW=16 Size=2048MB
256B stride
lpddr4_set_rate: change freq to 400000000 mhz 0, 1
lpddr4_set_rate: change freq to 800000000 mhz 1, 0
Trying to boot from BOOTROM
Returning to boot ROM...

U-Boot SPL 2020.10-armbian (Mar 14 2021 - 07:07:37 +0700)
Trying to boot from MMC2
NOTICE:  BL31: v2.2(release):a04808c-dirty
NOTICE:  BL31: Built : 07:07:20, Mar 14 2021

 

 

This is the output with Rockchip blob

 

 

 

@gprovost I took a quick look at your previous reply in the thread. It looks like I have the Rockchip blob (line "DDR Version 1.24 20191016") since I haven't updated since 2020.07 LK 5.9.X and initially my LK 5.10.X upgrades had issues. Is there a way to update the UBoot without losing the rest of the system? Perhaps that was my issue and since I didn't reboot in a while, I avoided that scenario...?

 

Modified the armbianEnv.txt on a spare Linux machine. Output below:

Spoiler

picocom v2.2

 

port is        : /dev/ttyUSB0

flowcontrol    : none

baudrate is    : 1500000

parity is      : none

databits are   : 8

stopbits are   : 1

escape is      : C-a

local echo is  : no

noinit is      : no

noreset is     : no

nolock is      : no

send_cmd is    : sz -vv

receive_cmd is : rz -vv -E

imap is        : 

omap is        : 

emap is        : crcrlf,delbs,

 

Type [C-a] [C-h] to see available commands

 

Terminal ready

DDR Version 1.24 20191016

In

channel 0

CS = 0

MR0=0x18

MR4=0x1

MR5=0x1

MR8=0x10

MR12=0x72

MR14=0x72

MR18=0x0

MR19=0x0

MR24=0x8

MR25=0x0

channel 1

CS = 0

MR0=0x18

MR4=0x1

MR5=0x1

MR8=0x10

MR12=0x72

MR14=0x72

MR18=0x0

MR19=0x0

MR24=0x8

MR25=0x0

channel 0 training pass!

channel 1 training pass!

change freq to 416MHz 0,1

Channel 0: LPDDR4,416MHz

Bus Width=32 Col=10 Bank=8 Row=16 CS=1 Die Bus-Width=16 Size=2048MB

Channel 1: LPDDR4,416MHz

Bus Width=32 Col=10 Bank=8 Row=16 CS=1 Die Bus-Width=16 Size=2048MB

256B stride

channel 0

CS = 0

MR0=0x18

MR4=0x1

MR5=0x1

MR8=0x10

MR12=0x72

MR14=0x72

MR18=0x0

MR19=0x0

MR24=0x8

MR25=0x0

channel 1

CS = 0

MR0=0x18

MR4=0x1

MR5=0x1

MR8=0x10

MR12=0x72

MR14=0x72

MR18=0x0

MR19=0x0

MR24=0x8

MR25=0x0

channel 0 training pass!

channel 1 training pass!

channel 0, cs 0, advanced training done

channel 1, cs 0, advanced training done

change freq to 856MHz 1,0

ch 0 ddrconfig = 0x101, ddrsize = 0x40

ch 1 ddrconfig = 0x101, ddrsize = 0x40

pmugrf_os_reg[2] = 0x32C1F2C1, stride = 0xD

ddr_set_rate to 328MHZ

ddr_set_rate to 666MHZ

ddr_set_rate to 928MHZ

channel 0, cs 0, advanced training done

channel 1, cs 0, advanced training done

ddr_set_rate to 416MHZ, ctl_index 0

ddr_set_rate to 856MHZ, ctl_index 1

support 416 856 328 666 928 MHz, current 856MHz

OUT

Boot1: 2019-03-14, version: 1.19

CPUId = 0x0

ChipType = 0x10, 254

SdmmcInit=2 0

BootCapSize=100000

UserCapSize=14910MB

FwPartOffset=2000 , 100000

mmc0:cmd5,20

SdmmcInit=0 0

BootCapSize=0

UserCapSize=121942MB

FwPartOffset=2000 , 0

StorageInit ok = 77912

SecureMode = 0

SecureInit read PBA: 0x4

SecureInit read PBA: 0x404

SecureInit read PBA: 0x804

SecureInit read PBA: 0xc04

SecureInit read PBA: 0x1004

SecureInit read PBA: 0x1404

SecureInit read PBA: 0x1804

SecureInit read PBA: 0x1c04

SecureInit ret = 0, SecureMode = 0

atags_set_bootdev: ret:(0)

GPT 0x3380ec0 signature is wrong

recovery gpt...

GPT 0x3380ec0 signature is wrong

recovery gpt fail!

LoadTrust Addr:0x4000

No find bl30.bin

No find bl32.bin

Load uboot, ReadLba = 2000

Load OK, addr=0x200000, size=0xdd6b0

RunBL31 0x40000

NOTICE:  BL31: v1.3(debug):42583b6

NOTICE:  BL31: Built : 07:55:13, Oct 15 2019

NOTICE:  BL31: Rockchip release version: v1.1

INFO:    GICv3 with legacy support detected. ARM GICV3 driver initialized in EL3

INFO:    Using opteed sec cpu_context!

INFO:    boot cpu mask: 0

INFO:    plat_rockchip_pmu_init(1190): pd status 3e

INFO:    BL31: Initializing runtime services

WARNING: No OPTEE provided by BL2 boot loader, Booting device without OPTEE initialization. SMC`s destined for OPTEE will return SMC_UNK

ERROR:   Error initializing runtime service opteed_fast

INFO:    BL31: Preparing for EL3 exit to normal world

INFO:    Entry point address = 0x200000

INFO:    SPSR = 0x3c9

 

 

U-Boot 2020.07-armbian (Dec 11 2020 - 22:44:41 +0100)

 

SoC: Rockchip rk3399

Reset cause: POR

DRAM:  3.9 GiB

PMIC:  RK808 

SF: Detected w25q128 with page size 256 Bytes, erase size 4 KiB, total 16 MiB

MMC:   mmc@fe320000: 1, sdhci@fe330000: 0

Loading Environment from MMC... *** Warning - bad CRC, using default environment

 

In:    serial

Out:   serial

Err:   serial

Model: Helios64

Revision: 1.2 - 4GB non ECC

Net:   eth0: ethernet@fe300000

scanning bus for devices...

Hit any key to stop autoboot:  0 

switch to partitions #0, OK

mmc1 is current device

Scanning mmc 1:1...

Found U-Boot script /boot/boot.scr

3185 bytes read in 6 ms (517.6 KiB/s)

## Executing script at 00500000

Boot script loaded from mmc 1

235 bytes read in 5 ms (45.9 KiB/s)

9809293 bytes read in 434 ms (21.6 MiB/s)

22460424 bytes read in 954 ms (22.5 MiB/s)

81696 bytes read in 14 ms (5.6 MiB/s)

2698 bytes read in 8 ms (329.1 KiB/s)

Applying kernel provided DT fixup script (rockchip-fixup.scr)

## Executing script at 09000000

## Loading init Ramdisk from Legacy Image at 06000000 ...

   Image Name:   uInitrd

   Image Type:   AArch64 Linux RAMDisk Image (gzip compressed)

   Data Size:    9809229 Bytes = 9.4 MiB

   Load Address: 00000000

   Entry Point:  00000000

   Verifying Checksum ... OK

## Flattened Device Tree blob at 01f00000

   Booting using the fdt blob at 0x1f00000

   Loading Ramdisk to f558b000, end f5ee5d4d ... OK

   Loading Device Tree to 00000000f550e000, end 00000000f558afff ... OK

 

Starting kernel ...

 

Edited by hartraft
Added more detail
Link to post
Share on other sites

@hartraft Yeah you could try to update the uboot on the microSD card using your spare linux computer.(Note: This can mess up your sdcard if you do it wrongly)

 

You will need again to mount the microSD.

 

cd  <sdcard-mount>/usr/lib/linux-u-boot-current-helios64*

dd if=idbloader.bin of=<sd-card device> seek=64 conv=notrunc
dd if=uboot.img of=<sd-card device> seek=16384 conv=notrunc
dd if=trust.bin of=<sd-card device> seek=24576 conv=notrunc

 

where <sd-card device> is something like /dev/mmcblk1

Link to post
Share on other sites

On 5/9/2021 at 5:54 AM, FloBaoti said:

For people running fine, are you using 2.5G NIC ?

I do, and this interface crashes every few days. I setup a cron to check every minute and down&up it if necessary.

 

I'm using both the 1g and 2.5g nics.  Before my recent reinstall, using OMV, the 2.5g NIC would constantly drop, and after a while it just stopped working completely.  After reinstall without OMV it has been solid.  Not sure if it was OMV causing the issue, but its the largest difference between my 2 installs.

Link to post
Share on other sites

On 5/8/2021 at 9:54 PM, FloBaoti said:

For people running fine, are you using 2.5G NIC ?

I do, and this interface crashes every few days. I setup a cron to check every minute and down&up it if necessary.

I'm using the 2.5Gbps NIC exclusively with a 2.5Gbps switch for a month with no issues so far.

Link to post
Share on other sites

This is what my uptime looks like. OMV5, plex and r/rutorrent with 5 12TB WDC disks (LVM).
None of these reboots were triggered by me...
Wish I could find were the problem lies, can't read the logs for debugging as the reboot is so abrupt that nothing gets written to disk.

armbian.PNG

Link to post
Share on other sites

3 hours ago, barnumbirr said:

Wish I could find were the problem lies, can't read the logs for debugging as the reboot is so abrupt that nothing gets written to disk.

Can't you connect a logger to serial out?

Link to post
Share on other sites

I've had a secondary device connected to my Helios64 over serial for the last couple of months. Unfortunately, even at verbosity 7 it dies so suddenly that the output doesn't actually provide any valuable information:

 

Starting kernel ...

[    2.721938] cacheinfo: Unable to detect cache hierarchy for CPU 0
[    2.881028] vcc3v3_sys_s0: failed to get the current voltage: -EPROBE_DEFER
[    2.900490] dw_wdt ff848000.watchdog: No valid TOPs array specified
[    3.012992] dwmmc_rockchip fe320000.mmc: All phases bad!
[    3.013479] mmc1: tuning execution failed: -5
[    3.013881] mmc1: error -5 whilst initialising SD card
[    3.135010] dwmmc_rockchip fe320000.mmc: All phases bad!
[    3.135502] mmc1: tuning execution failed: -5
[   12.521811] rk_gmac-dwmac fe300000.ethernet: cannot get clock clk_mac_speed
[   15.212309] dw-apb-uart ff1a0000.serial: forbid DMA for kernel console
[   17.035708] lm75 2-004c: supply vs not found, using dummy regulator
[   18.572536] rk_gmac-dwmac fe300000.ethernet eth0: PTP not supported by HW
[   18.817621] OF: graph: no port node found in /i2c@ff3d0000/typec-portc@22
[   18.833386] OF: graph: no port node found in /syscon@ff770000/usb2-phy@e450/otg-port
[   19.262970] [drm] unsupported AFBC format[3231564e]
[   19.366283] rockchip_vdec: module is from the staging directory, the quality is unknown, you have been warned.
[   19.410603] r8152 4-1.4:1.0 (unnamed net_device) (uninitialized): netif_napi_add() called with weight 256
[   19.421732] hantro_vpu: module is from the staging directory, the quality is unknown, you have been warned.
[   24.903650] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2

helios64 login: DDR Version 1.24 20191016
In
soft reset
SRX
channel 0
CS = 0
MR0=0x18
MR4=0x1
MR5=0x1
MR8=0x10
MR12=0x72
MR14=0x72

This is what my uptime looks like over the last 90 days. Again, none of the reboots were triggered by me.

1484020386_Screenshot2021-08-23at12-47-47Hosts-Grafana.thumb.png.c837aa5705da695015ea17591b9fa14c.png

 

Reboots wouldn't be such a pain the backside on their own if 90% of them didn't retrigger a mdadm RAID resync...

Link to post
Share on other sites

Finally managed to catch something. Had to reset the device after this:
 

[150372.308197] Unable to handle kernel paging request at virtual address ffff0001f77bd7bf
[150372.308900] Mem abort info:
[150372.309153]   ESR = 0x96000005
[150372.309431]   EC = 0x25: DABT (current EL), IL = 32 bits
[150372.309903]   SET = 0, FnV = 0
[150372.310178]   EA = 0, S1PTW = 0
[150372.310461] Data abort info:
[150372.310720]   ISV = 0, ISS = 0x00000005
[150372.311063]   CM = 0, WnR = 0
[150372.311333] swapper pgtable: 4k pages, 48-bit VAs, pgdp=000000000366b000
[150372.311925] [ffff0001f77bd7bf] pgd=00000000f7ff9003, p4d=00000000f7ff9003, pud=0000000000000000
[150372.312697] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[150372.313192] Modules linked in: softdog governor_performance cfg80211 rfkill r8152 snd_soc_hdmi_codec snd_soc_rockchip_i2s snd_soc_core snd_pcm_dmaengine snd_pcm rockchip_rga hantro_vpu(C) leds_pwm rockchip_vdec(C) snd_timer fusb302 videobuf2_dma_sg videobuf2_vmalloc tcpm snd gpio_charger v4l2_h264 panfrost videobuf2_dma_contig typec v4l2_mem2mem rockchipdrm soundcore videobuf2_memops videobuf2_v4l2 videobuf2_common gpu_sched dw_mipi_dsi videodev dw_hdmi mc analogix_dp drm_kms_helper cec sg rc_core drm drm_panel_orientation_quirks cpufreq_dt gpio_beeper ledtrig_netdev lm75 dm_mod sunrpc ip_tables x_tables autofs4 raid10 raid1 raid0 multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx realtek md_mod dwmac_rk stmmac_platform stmmac pcs_xpcs adc_keys pwm_fan
[150372.319348] CPU: 5 PID: 2769 Comm: rtorrent main Tainted: G         C        5.10.60-rockchip64 #21.08.1
[150372.320178] Hardware name: Helios64 (DT)
[150372.320531] pstate: 20000085 (nzCv daIf -PAN -UAO -TCO BTYPE=--)
[150372.321074] pc : __mod_zone_page_state+0x50/0x108
[150372.321494] lr : __mod_zone_page_state+0x3c/0x108
[150372.321914] sp : ffff8000167db4f0
[150372.322212] x29: ffff8000167db4f0 x28: 0000000000000001
[150372.322687] x27: 0000000000000020 x26: ffff0000f77d9100
[150372.323160] x25: 0000000000000001 x24: ffff0000f77d9e00
[150372.323634] x23: 00000000ffff8000 x22: ffff8001115757bf

 

Now that the Kobol team has pulled the plug, I doubt these issues will ever get fixed.

Link to post
Share on other sites

4 hours ago, barnumbirr said:

Finally managed to catch something.

 

You are using kernel 5.10.60 (Armbian 21.08.1).  Several Armbian patches did not compile with this version of the kernel - it is therefore unstable (see the parallel thread - upgrading to Bullseye). The kernel panic occurred after 150372 seconds = 41.77 hours of operation !

 

Link to post
Share on other sites

41 minutes ago, ebin-dev said:

 

You are using kernel 5.10.60 (Armbian 21.08.1).  Several Armbian patches did not compile with this version of the kernel - it is therefore unstable (see the parallel thread - upgrading to Bullseye). The kernel panic occurred after 150372 seconds = 41.77 hours of operation !

 

Ahh, so it's essentially a ticking time bomb? I pushed the bootloader from an SD card and got my system back up yesterday.

Link to post
Share on other sites

vor 13 Stunden schrieb IcerJo:

Ahh, so it's essentially a ticking time bomb? I pushed the bootloader from an SD card and got my system back up yesterday.

Can you explain how you got your system back?
Unfortunately, I also updated to 21.08.01 tonight and urgently need a downgrade to get the system back.
Would be very grateful to you!

Link to post
Share on other sites

Can you explain how you got your system back?
Unfortunately, I also updated to 21.08.01 tonight and urgently need a downgrade to get the system back.
Would be very grateful to you!
If you can SSH in or possibly boot off of an SD card with the latest image, you can reinstall the bootloader, it at least gets it up and running, least for me that means I can get in and see my drives and wrote to the, but I fear the emmc is still somewhat locked down but yet it lete turn ash back on.

Sent from my Pixel 4a (5G) using Tapatalk


Link to post
Share on other sites

5 hours ago, TDCroPower said:

Can you explain how you got your system back?

 

There is a possibility discussed in the parallel thread link.

 

You also could boot a fresh Armbian 21.05.4 off  SD and rsync with it the content from emmc to another bootable SD. Then you continue to downgrade linux on that second SD (booted)  ... and rsync the result back to emmc.

 

Maybe somebody else could explain how to downgrade the kernel on emmc using a chrooted environment.

Link to post
Share on other sites

same here, the user experience is excellent

this Nas is a real little bomb.

Kobol and the armbian community have done a great job.


Armbian_21.08.2_Helios64_buster_current_5.10.63.img

I don't use Softy, only curl yunohost to start,
no update | upgrate compulsive

 

I only use Two 32 Gb SD cards,

step by step after every major modification or installation that works,

I create a backup of the image with Win32DiskImager,
it allows me to go back if I have a problem.

16gb SD Card is more than enough and the images take up less space.

 

the Emmc/Sata should only be used when your image is mature and you have nothing more important to edit,
only take advantage of your Nas.

 

Enjoy

Link to post
Share on other sites

Hi,

 

Seem stable after one year of multiple testing with this settings:

- Last Kernel Linux 5.10.63-rockchip64 but in logs i have a regulator voltage error.. (with Linux 5.10.43-rockchip64, not this issue..)

- Max/Min CPU Freq 1,2GHz & Governor to Performance (advise here and i think it the best advise i never see... https://wiki.debian.org/InstallingDebianOn/Kobol/Helios64)

 

With another setting like this it's simple:

- Totaly instable randomly: Kernel Panic, lost network Freeze... (with docker container, ramdomly crash is daily...)

 

After many reading, i think Rockchip and Linux Kernel have a big problem with Governor and CPU Frequency Management

 

I am not a Guru and maybe i write bigs errors or sh..

 

Have a good day

Link to post
Share on other sites

Just recently I purchased a 2.5G switch to improve network bandwidth (and latency). I am really thrilled regarding the performance and stability of Helios64 using the 2.5G interface (eth1). I am currently on Armbian 21.08.3 Bullseye with Linux 5.10.43-rockchip64 (no errors at all).

 

With netatalk as a file server I can access now large files with 255 MByte/s. This is clearly enough to work on files stored on the Helio64 (i.e. accessing RAW images on the remote ssd, processing them on a laptop and writing back the resulting dng and jpg images).

 

Does anyone know if there is a power save state that could be configurerd for eth1 - or is this something that depends entirely on the switch ?

 

Helios64 rocks ! I really hope that the Kobol team will resume operations once the chip shortage is overcome.

 

EDIT: I am using the default cpufreq settings:

# cat /etc/default/cpufrequtils
ENABLE=true
MIN_SPEED=408000
MAX_SPEED=1800000
GOVERNOR=ondemand

Link to post
Share on other sites

Hi,
 
Seem stable after one year of multiple testing with this settings:
- Last Kernel Linux 5.10.63-rockchip64 but in logs i have a regulator voltage error.. (with Linux 5.10.43-rockchip64, not this issue..)
- Max/Min CPU Freq 1,2GHz & Governor to Performance (advise here and i think it the best advise i never see... https://wiki.debian.org/InstallingDebianOn/Kobol/Helios64)
 
With another setting like this it's simple:
- Totaly instable randomly: Kernel Panic, lost network Freeze... (with docker container, ramdomly crash is daily...)
 
After many reading, i think Rockchip and Linux Kernel have a big problem with Governor and CPU Frequency Management
 
I am not a Guru and maybe i write bigs errors or sh..
 
Have a good day
I have my CPU set to scale from the lowest number and 1.4 for the high-end, and performance set to on demand and have no problems, anything above 1.4 and it crashes within days.

Sent from my Pixel 4a (5G) using Tapatalk

Link to post
Share on other sites

3 hours ago, BipBip1981 said:

Hi,

 

I try "on demand" 400-1400Mhz this week-end, seem stable.

I will waiting one week...

Next i try 400-1800Mhz.

I do all my previous test with "Conservative" on governor, maybe "on demand" is the key of stability.

 

Have a good day

 

Are you using debian or an armbian debian image?

 

Personally I'm a fan of schedutil governor and run it on all my rk3399 devices

Link to post
Share on other sites

 
Are you using debian or an armbian debian image?
 
Personally I'm a fan of schedutil governor and run it on all my rk3399 devices
I'm using Armbian Debian, if you are using Armbian are you able to use the max clock speeds of the 2 big cores?

Sent from my Pixel 4a (5G) using Tapatalk

Link to post
Share on other sites

4 minutes ago, IcerJo said:
8 minutes ago, lanefu said:
all my rk3399 devices

I'm using Armbian Debian, if you are using Armbian are you able to use the max clock speeds of the 2 big cores?

 

Yep. And our 2ghz overlay.  So to be clear I'm talking about other rk3399 devices.  I don't have a helios64.

 

Just surprised to see the conversation given how well rk3399 works these days

Link to post
Share on other sites

 Share

6 6