Helios64 Support


gprovost

Recommended Posts

Donate and support the project!

11 hours ago, flower said:

this unit was sold as a "high quality nas". i was expecting a little tweaking and some flaws but not those instabilites. they are just inaccable for a nas. a nas is about data integrity!

 

You right. No excuse on our side, we are behind schedule and not up to expectation on the software maturity, maybe we should have stick LK4.4 (from rockchip) and forget about Linux mainline for now :-/

However we are still working at improving the stability and we are optimistic that very soon, it will get better.

 

Right now as you know LK5.8.16 is for some reason (we still can't figure out) unstable vs 5.8.14

We also realized that OMV install was removing some tx offloading tweak. So a lot of little things here and there that we only discover along the way.

 

BTW regarding this crash are you sure tx offload on eth1 was disable ?

 

Link to post
Share on other sites
2 hours ago, gprovost said:

 

What page is broken ??

the original install instructions. yesterday morning it still was - now it is gone. 

i wish for a litte bit more communication from your end. i still hope it will work out - and i still really like it. i am just frustrated.

 

regarding your question about tx offloading: i dont use omv and i have a script which disables them on every start.

 

but... something is wrong with networking on eth1 anyway. try syncing ~1tb with nextcloud. it wont work: even if the unit doesnt crash the nc client will disconnect multiple times and (even worse) you have to auth again (which is not typical for a disconnect)

 

my ssh connections are stable for hours though. not sure whats the problem there.

 

but.. does that imply that it is stable as long as you dont use 2.5GBe? that would be a good example where better communication would have helped.

Link to post
Share on other sites
13 minutes ago, flower said:

but... something is wrong with networking on eth1 anyway. try syncing ~1tb with nextcloud. it wont work: even if the unit doesnt crash the nc client will disconnect multiple times and (even worse) you have to auth again (which is not typical for a disconnect)

 

Have you tuned your nextcloud install for a 4GB RAM system ?

 

15 minutes ago, flower said:

but.. does that imply that it is stable as long as you dont use 2.5GBe? that would be a good example where better communication would have helped.

 

No we are not saying that for now.

 

18 minutes ago, flower said:

the original install instructions. yesterday morning it still was - now it is gone. 

We split 2 weeks ago the install page into 4 sub pages in order to add the eMMC install instruction.

 

BTW we just added a new section called Recovery, where we explained how to use Maskrom mode.

 

Link to post
Share on other sites
4 minutes ago, gprovost said:

 

Have you tuned your nextcloud install for a 4GB RAM system ?

 

Yes, all docker container and system together is around 1gb ram usage. 

It uses zram after a while but that seems unrelated to that sync error. 

I never saw my ram completely filled. 

 

I dont use armbian nextcloud. I tweaked the mysql, redis, alpine variant which now runs stable on my old pc. Emby takes much ram there, but that container wasnt enabled on helios64. 

Screenshot_20201027_074310.jpg

Link to post
Share on other sites
1 minute ago, flower said:

Yes, all docker container and system together is around 1gb ram usage. 

So you have tuned php-fpm, mariadb ? I haven't done extensive test of NC on Helios64, but I know that on Helios4 (with only 2GB RAM) i had to tune properly NC otherwise system will hangs (and reset because of watchdog). Not only because of OOM but also just too many thread / child process spawned overloading the system. I never tested NC in container though so maybe that doesn't apply to your use case.   ( Just for reference

https://docs.nextcloud.com/server/19/admin_manual/installation/server_tuning.html )

Link to post
Share on other sites
So you have tuned php-fpm, mariadb ? I haven't done extensive test of NC on Helios64, but I know that on Helios4 (with only 2GB RAM) i had to tune properly NC otherwise system will hangs (and reset because of watchdog). Not only because of OOM but also just too many thread / child process spawned overloading the system. I never tested NC in container though so maybe that doesn't apply to your use case.   ( Just for reference : 
https://docs.nextcloud.com/server/19/admin_manual/installation/server_tuning.html )
Yes it is tuned. Proxy buffers are very low (reverse proxy and fastcgi), mariadb and phpfpm.

I never saw it above 1gb ram - except for linux file caches which touch zram swap after a while. But never filled it.

But what is your point about many processes? I do have many. Most of them are sleeping though. Do you think thats a problem?

Gesendet von meinem CLT-L29 mit Tapatalk

Link to post
Share on other sites
 
My point is just to collect as much info as possible on use cases that generate crash in order to prioritize focus.
This afternoon (eg around 7h) i will put some smaller old disks in helios64 to keep an eye on the progress.

I can start those containers there again.

If you want me to provide any additional info just tell me.

As it will sit there with test data only i can give you root access too in case you are interested.

Gesendet von meinem CLT-L29 mit Tapatalk

Link to post
Share on other sites
8 hours ago, lyuyhn said:

Did someone try to use the wakeonlan feature of the Helios64? I tried to set g mode on eth0 using ethtool, but nothing happens when I try to send a magic packet. Is there a trick to make this work?

 

For now WoL is not supported yet. Still in progress because suspend mode doesn't work properly.

Link to post
Share on other sites
On 10/6/2020 at 10:25 AM, ebin-dev said:

I can confirm the issue with the USB-C cable. It does not fit into the USB-C port (the reason may just be the additional layer introduced by the label around the ports).

The issue can be resolved by cutting away about 0.5 mm of the plastic around the plug at the end of the USB cable. It will then easily fit into the port.

The same problem, fixed with a small file to sharpen the USB-C hole in the metal plate so the plastic part of the plug can be inserted through the plate closer to the port :-).

Link to post
Share on other sites

Have been testing my helios64 with 5x12TB drives in different setups.

omv & snapraid, but continuing a sync command uses so much ram that it become so slow that it might as well be described as unusable.

Next I tried omv & ZFS

Could get ZFS module compiled but then it wouldn't load the rest of utilities because missing dependencies (buster is real old)

So finally switched to try lizardfs. Crashed on me after a few hours, hooked up serial console.

Caught this error during the night: https://termbin.com/lsow

Is it easy to downgrade to 5.8.14 or even 4.x kernel ?

Link to post
Share on other sites
1 hour ago, fromport said:

Have been testing my helios64 with 5x12TB drives in different setups.

omv & snapraid, but continuing a sync command uses so much ram that it become so slow that it might as well be described as unusable.

Next I tried omv & ZFS

Could get ZFS module compiled but then it wouldn't load the rest of utilities because missing dependencies (buster is real old)

So finally switched to try lizardfs. Crashed on me after a few hours, hooked up serial console.

Caught this error during the night: https://termbin.com/lsow

Is it easy to downgrade to 5.8.14 or even 4.x kernel ?

Downgrading and pinning is possible through armbian-config. it is described in the latest kobol blog (that page seems offline for me atm).

i used this to downgrade:

apt install \
  linux-dtb-current-rockchip64=20.08.10 \
  linux-headers-current-rockchip64=20.08.10 \
  linux-image-current-rockchip64=20.08.10 \
  armbian-firmware=20.08.10 \
  linux-buster-root-current-helios64=20.08.10 \
  linux-u-boot-helios64-current=20.08.10

(but be carefull... next update would update them again)

 

afaik there is no way to go back to an 4.4 kernel. going from 4.4 to 5.8 is possible through armbian-config - but didnt work for me last time i tried

Link to post
Share on other sites
3 hours ago, flower said:

Downgrading and pinning is possible through armbian-config. it is described in the latest kobol blog (that page seems offline for me atm).

i used this to downgrade:


apt install \
  linux-dtb-current-rockchip64=20.08.10 \
  linux-headers-current-rockchip64=20.08.10 \
  linux-image-current-rockchip64=20.08.10 \
  armbian-firmware=20.08.10 \
  linux-buster-root-current-helios64=20.08.10 \
  linux-u-boot-helios64-current=20.08.10

(but be carefull... next update would update them again)

 

afaik there is no way to go back to an 4.4 kernel. going from 4.4 to 5.8 is possible through armbian-config - but didnt work for me last time i tried

I was on IRC on #armbian and they suggested using armbian-config to downgrade. Downgraded succesfully to 5.8.14.

Looked good in the beginning but as soon as it got some load on it, it crashed.

Then I used armbian-config to downgrade to 4.4 and like you predicted : it didn't go well

Serial console now shows during bootup

Quote

U-Boot 2020.07-armbian (Oct 18 2020 - 23:38:26 +0200)

SoC: Rockchip rk3399
Reset cause: POR
DRAM:  3.9 GiB
PMIC:  RK808
SF: Detected w25q128 with page size 256 Bytes, erase size 4 KiB, total 16 MiB
MMC:   mmc@fe320000: 1, sdhci@fe330000: 0
Loading Environment from MMC... *** Warning - bad CRC, using default environment

In:    serial
Out:   serial
Err:   serial
Model: Helios64
Revision: 1.2 - 4GB non ECC
Net:   eth0: ethernet@fe300000
scanning bus for devices...
Hit any key to stop autoboot:  0
Card did not respond to voltage select!
switch to partitions #0, OK
mmc0(part 0) is current device
Scanning mmc 0:1...
Found U-Boot script /boot/boot.scr
3185 bytes read in 19 ms (163.1 KiB/s)
## Executing script at 00500000
Boot script loaded from mmc 0
117 bytes read in 15 ms (6.8 KiB/s)
7302624 bytes read in 716 ms (9.7 MiB/s)
22114312 bytes read in 2118 ms (10 MiB/s)
libfdt fdt_check_header(): FDT_ERR_BADMAGIC
No FDT memory address configured. Please configure
the FDT address via "fdt addr <address>" command.
Aborting!
2698 bytes read in 36 ms (72.3 KiB/s)
Applying kernel provided DT fixup script (rockchip-fixup.scr)
## Executing script at 09000000
## Loading init Ramdisk from Legacy Image at 06000000 ...
   Image Name:   uInitrd
   Image Type:   AArch64 Linux RAMDisk Image (gzip compressed)
   Data Size:    7302560 Bytes = 7 MiB
   Load Address: 00000000
   Entry Point:  00000000
   Verifying Checksum ... OK
ERROR: Did not find a cmdline Flattened Device Tree
   Loading Ramdisk to f57ef000, end f5ee5da0 ... OK
FDT and ATAGS support not compiled in - hanging
### ERROR ### Please RESET the board ###

And that is from booting from the internal emmc

Will have to find which jumper to install and boot from sdcard I guess

Link to post
Share on other sites
8 minutes ago, fromport said:

(but be carefull... next update would update them again)

armbian-config can also be used to freeze firmware updates (kernel version to say)

Link to post
Share on other sites
8 hours ago, fromport said:

I was on IRC on #armbian and they suggested using armbian-config to downgrade. Downgraded succesfully to 5.8.14.

Looked good in the beginning but as soon as it got some load on it, it crashed.

Then I used armbian-config to downgrade to 4.4 and like you predicted : it didn't go well

Serial console now shows during bootup

And that is from booting from the internal emmc

Will have to find which jumper to install and boot from sdcard I guess

 

My machine is really making me sweat.

I installed armbian-buster-legacy-4.4 on an SD card and managed to boot it (much slower performance than internal emmc)

I tried to install lizardfs-chunkserver again. In order to do that I am mounting the spinning rust partitions who I previously formatted with XFS

 

When trying to mount the partitions RW I get this error

Quote

XFS Superblock has unknown read-only compatible features (0x4) enabled

It says you can mount in RO mode.

I hooked up an external 256GB ssd to the front usb port and started copying the contents of the first HD to the SSD

I was using

Quote

rsync -av --info=progress2 [source] [dest]

It started copying and suddenly the machine stopped responding (after few minutes)

This is what I saw on the serial console

Quote

Armbian 20.08.17 Buster ttyFIQ0

filer1 login: [ 1690.059453] BUG: spinlock lockup suspected on CPU#0, swapper/0/0
[ 1690.063991]  lock: 0xffffffc0f7f2b200, .magic: dead4ead, .owner: swapper/4/0, .owner_cpu: 4
[ 1690.070046] BUG: spinlock lockup suspected on CPU#5, rsync/4022
[ 1690.072860]  lock: 0xffffffc0f7f2b200, .magic: dead4ead, .owner: swapper/4/0, .owner_cpu: 4
[ 1690.104336] BUG: spinlock lockup suspected on CPU#3, kworker/3:1/3325
[ 1690.109230]  lock: 0xffffffc0f7f2b200, .magic: dead4ead, .owner: swapper/4/0, .owner_cpu: 4
[ 1691.144138] BUG: spinlock lockup suspected on CPU#2, kworker/2:2/656
[ 1691.144143] BUG: spinlock lockup suspected on CPU#1, kworker/1:2/164
[ 1691.144152]  lock: 0xffffffc0f7f2b200, .magic: dead4ead, .owner: swapper/4/0, .owner_cpu: 4
[ 1691.159662]  lock: 0xffffffc0f7f2b200, .magic: dead4ead, .owner: swapper/4/0, .owner_cpu: 4

I have tried no both legacy and the most up to date images.

And the only thing that is utterly consistent : it crashes on me all the times, no matter what I do.

Could this be a bad hardware version ?

Link to post
Share on other sites

@fromport First time we see this kind of crash message, but looking online it seems it isn't an unknown event on other rk3399 board.

 

Could do the same test that trigger this crash but first set the governor to performance.

 

cpufreq-set -c 0 -g performance
cpufreq-set -c 4 -g performance

 

cpufreq-info to check the governor has been change

 

Trying to figure out if it's still a DVFS config issue.

 

 

 

Link to post
Share on other sites
28 minutes ago, gprovost said:

@fromport First time we see this kind of crash message, but looking online it seems it isn't an unknown event on other rk3399 board.

 

Could do the same test that trigger this crash but first set the governor to performance.

 

cpufreq-set -c 0 -g performance
cpufreq-set -c 4 -g performance

 

cpufreq-info to check the governor has been change

 

Trying to figure out if it's still a DVFS config issue.

 

 

 

 21:58:29 up 24 min,  7 users,  load average: 9.06, 6.67, 3.94

[knock wood, but so far so good]

5 rsync in parallel at the moment

Edited by fromport
Link to post
Share on other sites

Out of curiosity, with mainline kernel and ondemand governor in place.
Apply this:

echo 40000 > /sys/devices/system/cpu/cpufreq/policy0/ondemand/sampling_rate
echo 465000 > /sys/devices/system/cpu/cpufreq/policy4/ondemand/sampling_rate

Does it still crash in your use case?
If it does not crash any longer I will explain what is going on.

Link to post
Share on other sites

Hello My Helios64 arrived two weeks ago, after a loooong trip through the Silk Road all the way down to South Europe. First thing I want to give a GREAT THANK YOU to the people at kobol shop support, as they have helped me with my transport/forwarded/ carrier nightmare loop: the address is in Chinese-no address specified-contact the sender-won’t give you your parcel-please call again- no address specified....

 

So. After reading a good slice of the wiki I couldn’t find a comprehensive manual/article about the simplest usage of the front panel. Glad if you can point me to such information.

The principal source of confusion for me are the comments section of the kernel 4.4/5 versions, where there are some mentions about power on/off problems.

The questions I have right now are:

  • What does each light mean? I’ve read blue means ok, red means problem, in reddit they have pointed out that the System blue light blinking is normal.
  • How does the power on/off button behave ?
    • I’m confused about the PSU/stand by/ WOL/on states, can I damage the OS with a long press?
    • Does a short press wake up the system if WOL is not configured?
  • The reset button : is it equivalent to a software reboot?
  • What is the procedure to hot-swap drives? Does it need some panel button pressing? Is it done via software?

Thank you for your help.

Link to post
Share on other sites
6 hours ago, usual user said:

Out of curiosity, with mainline kernel and ondemand governor in place.
Apply this:


echo 40000 > /sys/devices/system/cpu/cpufreq/policy0/ondemand/sampling_rate
echo 465000 > /sys/devices/system/cpu/cpufreq/policy4/ondemand/sampling_rate

Does it still crash in your use case?
If it does not crash any longer I will explain what is going on.

I am so glad my machine survived the night without crashing.

It's now part of my lizardfs distributed storage pool and is still synchronizing

Quote

Memory:        Total        Used        Free     Buffers                       
RAM:         3901088     3843136       57952        1328                       
Swap:        1950540       57468     1893072                                   

Bootup: Thu Oct 29 22:20:18 2020   Load average: 1.26 1.61 1.68 1/275 7993     

user  :      01:00:25.07   1.8%  page in :         81241566                    
nice  :      00:00:01.07   0.0%  page out:        410572768                    
system:      02:27:57.04   4.3%  page act:          4335568                    
IOwait:      01:55:27.21   3.4%  page dea:           226869                    
hw irq:      00:00:00.00   0.0%  page flt:          2458237                    
sw irq:      00:37:46.36   1.1%  swap in :             2239                    
idle  :   2d 02:53:08.08  89.4%  swap out:            15848                    
uptime:      09:49:06.91         context :        660800969                    

mtdblock0              70r               sdb            14593r          354626w
mmcblk0            8184r             5   sdc            24794r          344540w
mmcblk0p1            8036r               sdd            24769r          342203w
mmcblk1             193r                 sde            23975r          342134w
mmcblk1rpmb               4r             zram0            1541r            2040
mmcblk1boot1             116r            zram1            2802r           15849
mmcblk1boot0             116r            sdf          1061502r            2503w
sda            46103r          350811w                                         

eth0        TX 194.79GiB     RX 302.53GiB     lo          TX 0.00B         RX 0.00B        
eth1        TX 0.00B         RX 0.00B                         

Once it has ran 24 hours , i will reboot and change those parameters and report back.

Link to post
Share on other sites

Someone got Mayan EDMS installed? I tried to install it throu armbian-config, but it doesn't work. It seems, that he try to install the wrong arch-Version :blink: Should i update the armbian-config throu omv?

 

Bad thing. for now the system works a complete week! But with the oldest image. I'm afraid to update...

 

 

Link to post
Share on other sites
  • Werner locked this topic
Guest
This topic is now closed to further replies.