clostro

June 30, 2021

https://wiki.kobol.io/helios64/button/#recovery-button

Usb type-c to emmc flashing according to wiki. I'm not sure how it works though, never tried it.

April 12, 2021

I'm using Zyxel XGS1010-12. Has 8x1gbe, 2x2.5gbe, and 2x10gb sfp+ ports. Works ok, does what I need.

April 10, 2021

Can maybe disable the service and put the script it is running, /usr/bin/helios64-ups.sh to root cron for every minute. You could make it run every 20 seconds by adding a while loop that goes 3 turns and sleeps 20 seconds in each turn.

However, I don't see where this script checks to see if mains power is active. What I see is a script that will continually shutdown the device if the battery voltage is low even when the mains returns. Is that function provided by the service timer, as in the timer stops when there is mains?

If so absolutely do NOT cron helios64-ups.sh as it is. Because you won't be able to boot the device until the battery charges over 7 volts.

#!/bin/bash

#7.0V   916     Recommended threshold to force shutdown system
TH=916

val=$(cat '/sys/bus/iio/devices/iio:device0/in_voltage2_raw')
sca=$(cat '/sys/bus/iio/devices/iio:device0/in_voltage_scale')
adc=$(echo "$val * $sca / 1" | bc)

if [ "$adc" -le $TH ]; then
    /usr/sbin/poweroff
fi

There is a whole device with lots of data in there, is there any documentation on iio:device0?

edit- apparently there is https://wiki.kobol.io/helios64/ups/#ups-status-under-linux

March 30, 2021

May I suggest outputting dmesg live to a network location?

I'm not sure if the serial console output is the same as 'dmesg' but if it is, you can live 'nohup &' it to any file. That way you wouldn't have to keep connected to console or ssh all the time. Just don't output it to any local file system as writing to a local file system at a crash might corrupt it and cause more problems.

nohup dmesg --follow > /network/location/folder/helios64-log.txt & 2>&1

exit

needed to have single >, and exit the session with 'exit' apparently..

March 28, 2021

Mine has been pretty stable for the last 2 months. Last restart was to update to Armbian 21.02.2 Buster with 5.10.16 kernel 24 days ago. I applied the Cpu freq mod for the previous kernel, and upgraded with apt, no fresh install. Cpu freq mod is still in place I assume. Device has been completely stable since that mod, and I am not undoing it for the time being. Reliability > everything else.

I'm using the 2.5G port exclusively with a 2.5G switch.

There are 4 4TB Seagate Ironwolf drives and an old (very old) Sandisk Extreme SSD in there.

No OMV or ZFS. Just LVM Raid5 with SSD rw cache.

No docker or VMs running.

Cockpit for management.

Samba for file sharing.

Borg for backups.

March 26, 2021

Guys, those drives are CMR (ones I could pick out from the logs posted), not SMR. EFRX is CMR, EFAX is SMR.

Also, SMR is the one to avoid at all costs.

From https://www.servethehome.com/buyers-guides/top-hardware-components-freenas-nas-servers/top-picks-freenas-hard-drives/, ServeTheHome runs a lot of ZFS pools in their infra.

Quote

As a quick note. We do not recommend the new WD Red SMR drives. See WD Red SMR vs CMR Tested Avoid Red SMR.

Please refer to manufacturer spec sheets and tables before buying a HDD. Here are a few lists-

https://nascompares.com/answer/list-of-wd-cmr-and-smr-hard-drives-hdd/

https://blog.westerndigital.com/wd-red-nas-drives/

https://www.seagate.com/internal-hard-drives/cmr-smr-list/

As far as I know, all Hitachi drives are CMR.

March 12, 2021

@kelsoAny SSD will be fast enough for the network or USB speeds of this device. If you are buying new you can pick WD red, Samsung Evo, Sandisk Ultra/Extreme, Seagate firepro (?) ... just stay away from little or no known brands. You can check the models you picked and compare them here - https://ssd.userbenchmark.com/

They are getting a well deserved grilling for their CPU comparisons but I think their SSD data is good enough. I would be looking for the best 'Mixed' value for performance for use in this device, as the network and USB speeds are capping the max read or write speed anyhow.

Western Digitals you picked use CMR, which is supposedly much better than SMR, can take a look at this table if you have to pick other models. https://nascompares.com/answer/list-of-wd-cmr-and-smr-hard-drives-hdd/

One suggestion before you put any critical data into those brand new disks- Run 'smart' tests on each drive, long tests. Should take about 10 hours I think.

One of my brand new Seagates couldn't complete a test at first run and had to be replaced. Now I'm on edge and running nightly borg backups to an older NAS because the other disks are from the same series. Call me paranoid but I usually stagger HDD purchases by a few days in between, and/or order from different stores to avoid having them from the same production batch, couldn't do that this time around.

@WernerI use 4 4TB Seagate NAS drives, whatever their branding was.. And an old 240GB Sandisk Extreme SSD for Raid caching. LVM Raid 5 + dm-cache(read and write, writeback, smq). It ain't bad. SSD really picks up the slack of the spinning rust especially when you are pushing random writes to the device, and the smq is pretty smart at read caching for hot files.

February 26, 2021

@Gareth Halfacree If you would be interested in FreeBSD, @SleepWalker was experimenting with it some time ago. Maybe still is.

February 16, 2021

Just did an in place upgrade with armbian-config (was running the previous build with static cpu freq for 23 days stably), it all works fine after a reboot. Wanted to report results.

Few things -

-After armbian-config upgrade and reboot, there still were packages not up to date, had to run apt again.

-The firewalld zone didn't come up on its own, but the firewall was working as configured. I'm confused about that one. It came back up after I restarted the firewalld service I think.

-And the weirdest thing is that the zram1 log was full 100%. armbian-ramlog service was failing and wouldn't work properly until I manually deleted /var/log/pcp* since it was the one hogging all the space.

Did test the 2.5G adapter with crystaldisk on Windows client a few times, which would previously crash eth1 (sometimes near instant death), and so far it works fine.

Did some iperf3 tests, and I am getting +1.9gbps on one side and +2.2gbps on the other with 'ethtool -K eth1 rx off tx on'

edit: forgot about the eth0, it wouldn't show up in ifconfig after the update. Not sure what happened about that one since the cockpit interface shows it just fine, just not configured. It may have been me toying around with it before the update. I might have cloned its MAC address so it would get the same IP as the eth1 from pihole dhcp, really can't remember.

February 14, 2021

Putting aside the discussion about disk health and spin up and downs, a non spin down value might solve your issue here.

You can take a look at both -S and -B options. I couldn't figure out the difference between their 'set' values entirely. They are both supposedly setting the APM value, but aside from -S putting the drives to sleep immediately and then setting the sleep timer, they have different definitions for the level values.

From https://man7.org/linux/man-pages/man8/hdparm.8.html

For instance -B

Quote

Possible settings range from values 1 through 127 (which permit spin-down), and values 128 through 254 (which do not permit spin-down). The highest degree of power management is attained with a setting of 1, and the highest I/O performance with a setting of 254. A value of 255 tells hdparm to disable Advanced Power Management altogether on the drive

and -S

Quote

Put the drive into idle (low-power) mode, and also set the standby (spindown) timeout for the drive.

Quote

A value of zero means "timeouts are disabled": the device will not automatically enter standby mode. Values from 1 to 240 specify multiples of 5 seconds, yielding timeouts from 5 seconds to 20 minutes. Values from 241 to 251 specify from 1 to 11 units of 30 minutes, yielding timeouts from 30 minutes to 5.5 hours. A value of 252 signifies a timeout of 21 minutes. A value of 253 sets a vendor-defined timeout period between 8 and 12 hours, and the value 254 is reserved. 255 is interpreted as 21 minutes plus 15 seconds. Note that some older drives may have very different interpretations of these values.

As you can see, the value of 255 and other special levels are different between -S and -B. But the definitions also sounds like they are doing the same thing as well.

I would like to learn if anyone can clarify the difference.

February 11, 2021

Have you tried to see the sleep/spin down timers on the disks with hdparm?

hdparm -I /dev/sd[a-e] | grep level

And here is a chart for interpreting the output APM values- http://www.howtoeverything.net/linux/hardware/list-timeout-values-hdparm-s

February 7, 2021

Just wanted to report that the CPU frequency mod has been running stable under normal use for 15 days now (on 1Gbe connection). Haven't tried the voltage mod.

I'll switch to the February 4th Buster 5.10 build soon.

edit: 23 days, and I shut it down for sd card backup and system update. cpu freq mod is rock solid.

February 1, 2021

Hi @aprayoga

I have a question before I try modifying boot.scr -

I tried @gprovost's suggestion about the CPU speeds and the device has been running stable for nearly 9 days now. I was waiting for 14 days to test its stability and then update Armbian.

The current Armbian running on the device is Linux helios64 5.9.14-rockchip64 #20.11.4 SMP PREEMPT Tue Dec 15 08:52:20 CET 2020 aarch64 GNU/Linux. But I have been meaning to update to Armbian_20.11.10_Helios64_buster_current_5.9.14.img.xz.

- Shall I try this mod on top of the current setup with the old Armbian and CPU speed mod? Or can I update to the newest image?

- If I update to the latest image, can I just update in place or do you suggest a fresh install?

- Also shall I also redo the CPU speed mod as well? After a fresh install.

Thanks

January 27, 2021

Thanks for the update, I really hope its not the cables in my case. I mean I was not getting these lines in the log before, just got them for the first time.

The only difference from the last few boots is the CPU frequency and speed governor per the quote below.

I don't think they are related, this was originally suggested for troubleshooting a 'sync and drop_caches' issue, which works fine on my device.

Later it was also suggested for the 'mystery red light' issue, which was a problem on my device.

But this could be something else.

Hopefully not the cables, I would rather have the SSD connected to that slot fail than to change the harness.

On 1/4/2021 at 6:48 AM, gprovost said:

Hi, could try to the following tweak and redo the same methods that triggers the crash :

Run armbian-config, go to -> System -> CPU

And set:

Minimum CPU speed = 1200000

Maximum CPU speed = 1200000

CPU governor = performance

This will help us understand if instability is still due to DVFS.

January 27, 2021

Just started seeing these 'ata1.00: failed command: READ FPDMA QUEUED' lines in my log as well. I assume ata1 is sda.

Had a whole bunch of them a while after boot and nothing afterwards. But the device was not accessed a whole lot during this time. It just booted up after a crash and the LVM cache was cleaning the dirty bits on the cache SSD connected to sda.

sdb-e are populated with 4x4TB ST4000VN008 Ironwolfs, and sda is hooked up to an old (and I mean old) Sandisk Extreme 240GB SSD SDSSDX240GG25.

I attached the smartctl report for the SSD below, and it passed a short smart test just now. I'll start a long test in a minute.

Using Linux helios64 5.9.14-rockchip64 #20.11.4 SMP PREEMPT Tue Dec 15 08:52:20 CET 2020 aarch64 GNU/Linux.

Serial number of the board is 000100000530 if that could help.

Spoiler


[ 4101.251887] ata1.00: exception Emask 0x10 SAct 0xffffffff SErr 0xb80100 action 0x6
[ 4101.251895] ata1.00: irq_stat 0x08000000
[ 4101.251903] ata1: SError: { UnrecovData 10B8B Dispar BadCRC LinkSeq }
[ 4101.251911] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.251924] ata1.00: cmd 60/00:00:00:02:b6/02:00:08:00:00/40 tag 0 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.251929] ata1.00: status: { DRDY }
[ 4101.251934] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.251946] ata1.00: cmd 60/00:08:00:a0:b3/02:00:08:00:00/40 tag 1 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.251950] ata1.00: status: { DRDY }
[ 4101.251956] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.251968] ata1.00: cmd 60/00:10:00:7c:b3/02:00:08:00:00/40 tag 2 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.251972] ata1.00: status: { DRDY }
[ 4101.251977] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.251989] ata1.00: cmd 60/00:18:00:58:b3/02:00:08:00:00/40 tag 3 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.251993] ata1.00: status: { DRDY }
[ 4101.251998] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252010] ata1.00: cmd 60/00:20:00:34:b3/02:00:08:00:00/40 tag 4 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252014] ata1.00: status: { DRDY }
[ 4101.252020] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252032] ata1.00: cmd 60/00:28:00:3e:b1/02:00:08:00:00/40 tag 5 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252036] ata1.00: status: { DRDY }
[ 4101.252041] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252053] ata1.00: cmd 60/00:30:00:1a:b1/02:00:08:00:00/40 tag 6 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252057] ata1.00: status: { DRDY }
[ 4101.252062] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252074] ata1.00: cmd 60/00:38:00:f6:b0/02:00:08:00:00/40 tag 7 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252078] ata1.00: status: { DRDY }
[ 4101.252083] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252095] ata1.00: cmd 60/00:40:00:d2:b0/02:00:08:00:00/40 tag 8 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252099] ata1.00: status: { DRDY }
[ 4101.252105] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252116] ata1.00: cmd 60/00:48:00:ae:b0/02:00:08:00:00/40 tag 9 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252120] ata1.00: status: { DRDY }
[ 4101.252126] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252138] ata1.00: cmd 60/00:50:00:b8:ae/02:00:08:00:00/40 tag 10 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252142] ata1.00: status: { DRDY }
[ 4101.252147] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252159] ata1.00: cmd 60/00:58:00:94:ae/02:00:08:00:00/40 tag 11 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252163] ata1.00: status: { DRDY }
[ 4101.252169] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252181] ata1.00: cmd 60/00:60:00:b8:bd/02:00:08:00:00/40 tag 12 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252185] ata1.00: status: { DRDY }
[ 4101.252190] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252202] ata1.00: cmd 60/00:68:00:94:bd/02:00:08:00:00/40 tag 13 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252206] ata1.00: status: { DRDY }
[ 4101.252211] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252223] ata1.00: cmd 60/00:70:00:70:bd/02:00:08:00:00/40 tag 14 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252228] ata1.00: status: { DRDY }
[ 4101.252232] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252244] ata1.00: cmd 60/00:78:00:4c:bd/02:00:08:00:00/40 tag 15 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252249] ata1.00: status: { DRDY }
[ 4101.252254] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252266] ata1.00: cmd 60/00:80:00:56:bb/02:00:08:00:00/40 tag 16 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252270] ata1.00: status: { DRDY }
[ 4101.252276] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252288] ata1.00: cmd 60/00:88:00:32:bb/02:00:08:00:00/40 tag 17 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252292] ata1.00: status: { DRDY }
[ 4101.252297] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252309] ata1.00: cmd 60/00:90:00:0e:bb/02:00:08:00:00/40 tag 18 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252314] ata1.00: status: { DRDY }
[ 4101.252319] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252330] ata1.00: cmd 60/00:98:00:ea:ba/02:00:08:00:00/40 tag 19 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252335] ata1.00: status: { DRDY }
[ 4101.252340] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252352] ata1.00: cmd 60/00:a0:00:c6:ba/02:00:08:00:00/40 tag 20 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252356] ata1.00: status: { DRDY }
[ 4101.252361] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252373] ata1.00: cmd 60/00:a8:00:d0:b8/02:00:08:00:00/40 tag 21 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252377] ata1.00: status: { DRDY }
[ 4101.252382] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252394] ata1.00: cmd 60/00:b0:00:ac:b8/02:00:08:00:00/40 tag 22 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252399] ata1.00: status: { DRDY }
[ 4101.252404] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252416] ata1.00: cmd 60/00:b8:00:88:b8/02:00:08:00:00/40 tag 23 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252420] ata1.00: status: { DRDY }
[ 4101.252426] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252437] ata1.00: cmd 60/00:c0:00:64:b8/02:00:08:00:00/40 tag 24 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252442] ata1.00: status: { DRDY }
[ 4101.252447] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252459] ata1.00: cmd 60/00:c8:00:40:b8/02:00:08:00:00/40 tag 25 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252463] ata1.00: status: { DRDY }
[ 4101.252468] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252480] ata1.00: cmd 60/00:d0:00:4a:b6/02:00:08:00:00/40 tag 26 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252485] ata1.00: status: { DRDY }
[ 4101.252490] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252502] ata1.00: cmd 60/00:d8:00:26:b6/02:00:08:00:00/40 tag 27 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252506] ata1.00: status: { DRDY }
[ 4101.252511] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252523] ata1.00: cmd 60/00:e0:00:de:b5/02:00:08:00:00/40 tag 28 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252527] ata1.00: status: { DRDY }
[ 4101.252532] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252545] ata1.00: cmd 60/00:e8:00:ba:b5/02:00:08:00:00/40 tag 29 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252549] ata1.00: status: { DRDY }
[ 4101.252554] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252566] ata1.00: cmd 60/00:f0:00:c4:b3/02:00:08:00:00/40 tag 30 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252570] ata1.00: status: { DRDY }
[ 4101.252575] ata1.00: failed command: READ FPDMA QUEUED
[ 4101.252587] ata1.00: cmd 60/00:f8:00:70:ae/02:00:08:00:00/40 tag 31 ncq dma 262144 in
                        res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[ 4101.252592] ata1.00: status: { DRDY }
[ 4101.252603] ata1: hard resetting link
[ 4101.727761] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 4101.749701] ata1.00: configured for UDMA/133
[ 4101.752101] ata1: EH complete

Spoiler


smartctl  -x /dev/sda
smartctl 6.6 2017-11-05 r4594 [aarch64-linux-5.9.14-rockchip64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SandForce Driven SSDs
Device Model:     SanDisk SDSSDX240GG25
Serial Number:    123273400127
LU WWN Device Id: 5 001b44 7bf3fcb3f
Firmware Version: R201
User Capacity:    240,057,409,536 bytes [240 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS, ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Jan 27 13:24:04 2021 +03
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     254 (maximum performance)
Rd look-ahead is: Disabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  48) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x0021) SCT Status supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   119   119   050    -    0/225624030
  5 Retired_Block_Count     PO--CK   100   100   003    -    7
  9 Power_On_Hours_and_Msec -O--CK   000   000   000    -    30893h+27m+05.090s
 12 Power_Cycle_Count       -O--CK   099   099   000    -    1951
171 Program_Fail_Count      -O--CK   000   000   000    -    0
172 Erase_Fail_Count        -O--CK   000   000   000    -    0
174 Unexpect_Power_Loss_Ct  ----CK   000   000   000    -    999
177 Wear_Range_Delta        ------   000   000   000    -    3
181 Program_Fail_Count      -O--CK   000   000   000    -    0
182 Erase_Fail_Count        -O--CK   000   000   000    -    0
187 Reported_Uncorrect      -O--CK   100   100   000    -    0
194 Temperature_Celsius     -O---K   028   055   000    -    28 (Min/Max 10/55)
195 ECC_Uncorr_Error_Count  --SRC-   120   120   000    -    0/225624030
196 Reallocated_Event_Count PO--CK   100   100   003    -    7
201 Unc_Soft_Read_Err_Rate  --SRC-   120   120   000    -    0/225624030
204 Soft_ECC_Correct_Rate   --SRC-   120   120   000    -    0/225624030
230 Life_Curve_Status       PO--C-   100   100   000    -    100
231 SSD_Life_Left           PO--C-   100   100   010    -    0
233 SandForce_Internal      ------   000   000   000    -    10541
234 SandForce_Internal      -O--CK   000   000   000    -    13196
241 Lifetime_Writes_GiB     -O--CK   000   000   000    -    13196
242 Lifetime_Reads_GiB      -O--CK   000   000   000    -    13491
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL,SL  R/O      1  SATA Phy Event Counters log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xb7       GPL,SL  VS      16  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log (GP Log 0x03) not supported

SMART Error Log not supported

SMART Extended Self-test Log Version: 0 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       0 (0x0000)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    28 Celsius
Power Cycle Min/Max Temperature:     27/31 Celsius
Lifetime    Min/Max Temperature:     10/55 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Temperature History Version:     10 (Unknown, should be 2)
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        10 minutes
Min/Max recommended Temperature:      0/120 Celsius
Min/Max Temperature Limit:            0/ 0 Celsius
Temperature History Size (Index):    56576 (7)
Invalid Temperature History Size or Index

SCT Error Recovery Control command not supported

Device Statistics (GP/SMART Log 0x04) not supported

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2            4  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            4  Device-to-host register FISes sent due to a COMRESET
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0010  2            0  R_ERR response for host-to-device data FIS, non-CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x0013  2            0  R_ERR response for host-to-device non-data FIS, non-CRC
0x0002  2            0  R_ERR response for data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x000b  2            0  CRC errors within host-to-device FIS
0x000d  2            0  Non-CRC errors within host-to-device FIS

January 23, 2021

Just had another one right now. Previous one was 3 days ago.

So I plugged the serial cable but the device was completely gone. No login, or any output.

Also there are no logs or anything about previous boot once rebooted either.

journalctl --list-boots lists only the current boot, and dmesg -E is empty.

Anyhow, trying the CPU speed thing now. I hope it works, cleaning the entire SSD cache that gets dirty at every crash is worrying me.

January 21, 2021

- I don't see the point of baking a m.2 or any future nvme PCI-e port on the board honestly. Instead I would love to see a PCI-e slot or two exposed. That way end user can pick what expansion is needed. It can be an nvme adapter, can be 10gbe network, or SATA or USB or even a SAS controller for additional external enclosures. It will probably be more expensive for the end user but will open up a larger market for the device. We already have 2 lanes/10gbps worth of PCIe not utilized as is (correct me on this).

- Absolutely more/user upgradeable and ECC ram support.

- Better trays, you guys they are a pain right now lol. ( Loving the black and purple though )

- Some sort of active/quiet cooling on the SoC separate from the disks, as any disk activity over the network or even a simple CPU bound task ramps up all the fans immediately.

- Most important of all, please do not stray too far from the current architecture both hardware and firmware wise. Aside from everything else, we would really love to see a rock solid launch next time around. Changing a lot of things might cause most of your hard work to-date to mean nothing.

January 20, 2021

Just had the same thing happen to my unit. It was the first time I haven't manually rebooted the system for some reason for about 10 days I think.

January 19, 2021

@allen--smithee I can see how these options could be useful to others, but in my case, all I needed was the original config. 4 HDD raid5 + 1 SSD writeback cache I would have been forced to get a QNAP if I needed more.

But I'll play

I would pick none, here is my counter offer-

A- Replace the m.2 slot with a PMP enabled eSATA port on the back for a low end disk tower to be attached for bulk/slow storage. There are generic 8 slot (no PMP over 5, those are raid towers but still eSATA connected) towers available right now. Added bonus is more space on PCB for something.

OR

B- If possible expose a PCI-e slot or two for adding in a SFP+ and a mini SAS card for a fast disk tower and 10gbe network to be connected. This probably would require more ram, more powerful cpu, and more watts, and all that jazz. Not sure if this is possible in ARM arch available to us mere mortals yet. Obviously Amazon and others have their datacenter ARM cpus that has all these and then some. (I just realized after writing this that I explained a xeon QNAP unit... but they sell, sooo maybe good?)

They could design and sell these disk towers and the add in cards as well. Currently available SAS and eSata towers are quite drab and blend to look at. Attractive Helios64 design aesthetic would make them look very pretty.

Also, this way the original helios64 box that is suited to home use doesn't get bulkier, its design is mostly unchanged so the production troubles that are solved won't be scratched but built upon. I remember from the blog comments of the Kobol guys that the enclosure factory was problematic and caused some delays at the end. Multiple PCBs are not produced for each use case and the user is free to add in options.

In my opinion, A) eSATA port on the back and a Kobol designed disk enclosure would be perfect for most. And a good incremental way forward.

3 disk ZFS or RAID5 + 1 SSD for L2ARC/SLOG or dmcache for fast daily work space

and

5/8 disks for daily backups / archives / any slow storage needs.

@gprovost Thank you for renaming the thread, I was feeling bad for posting at and keeping it active.

January 16, 2021

12 minutes ago, Werner said:

Feel free to take a look into. Curious about that too.

Couldn't find anything in the Helios64 patch or other RK3399 that relates to port multiplication.

Not sure I'm looking at this correctly but when I search the repo for 'pmp' I'm seeing a lot of results in config/kernel folder with

CONFIG_SATA_PMP=y

Extrapolating those search results, I think I can assume that port multiplication is off by default in Armbian, and needs to be enabled for that specific SoC or device. No?

But later I found this on https://wiki.kobol.io/helios64/sata/

Quote

SATA Controller Features¶

5 SATA ports,

Command-based and FIS-based for Port Multiplier,

Compliance with SATA Specification Revision 3.2,

AHCI mode and IDE programming interface,

Native Command Queue (NCQ),

SATA link power saving mode (partial and slumber),

SATA plug-in detection capable,

Drive power control and staggered spin-up,

SATA Partial / Slumber power management state.

So apparently port multipliers should be supported on Helios64, or will be supported? Honestly I don't think they would be very useful since the case has to be modified and stuff. I first mentioned it as a joke to begin with. I don't think anyone would go for this as the case itself is a cool piece of design and butchering it would be a shame.

January 16, 2021

https://man7.org/linux/man-pages/man7/lvmcache.7.html

https://linuxhint.com/configuring-zfs-cache/

Here you go... use 4 HDDs and an SSD cache.

Or sell your unit, quite a lot of people wanted to buy one and couldn't in time.

OR, Frankenstein your unit and add in port multipliers to original SATA ports. Can add up to 20 HDDs to the 4 original SATA ports, and up to 5 SSDs to the remaining 1 original SATA port. The hardware controller specs say that it supports port multipliers, not sure about Armbian kernel, might have to modify.

Btw, you can take a look at the new version of Odroid H2+ with port multipliers (up to 10 SATA disks, + PCI-e Nvme) if you are into more tinkering. Also you get two 2.5G network ports instead of one. Hardkernel team has a blog post about its setup and benchmarks. https://wiki.odroid.com/odroid-h2/application_note/10_sata_drives

I am planning to expand my network infra with the H2+ soon. You can even plug a $45 4 port 2.5g switch into the Nvme slot now. I'm going crazy about this unit. If only it didn't have a shintel (r/AyyMD) cpu in.

Anyhow-

Just doing a bit of research shows that this was not a 'decision' made by Kobol, not exactly. There are just two reliable PCI-e to SATA controllers I could find that support multiple SATA ports(+4), with the limitation of RK3399 which has 4 PCI-e lanes. It would be a different story if RK had 8 lanes, but that is another can of worms that includes cpu arch, form factor, etc. Not gonna open that can while I'm barely qualified to open this one.

What we have here in Helios64 is JMB585, and then the other option was Marvel 88SE9215. Marvel only supports 4 ports, while JMB supports 5. There are no controllers that work reliably with 4 PCI-e 2.1 lanes and have more than 5 ports that I could find.

There is the quite new Asmedia ASM1166 which actually supports 6 ports, but that was probably not available during design of Helios64 as it is quite new. Not only that, there is a weird thread about its log output on Unraid forums.

At the end, this '5 ports only' was not exactly a decision for Kobol. But a rather uninformed decision made by you. You don't see people here complaining about this. You are the second one who even mentioned it, and the only one who is complaining so far with CAPS no less. Which means that the specs were clear to pretty much everyone that this was the case.

My suggestion is to replace one of the drives with a Samsung 860 pro, make it a SLOG/L2ARC, or in my case a LVM cache (write back mode, make sure you have a proper UPS, or the battery in the unit is connected properly) and call it a day. SATA port is faster than the 2.5G ethernet or the USB DAS mode anyhow so your cache SSD will mostly perform ok.

January 11, 2021

@gprovost Hi, my Helios64 is connected to a Zyxel XGS1010-12 switch on its 2.5gbps ports via a CAT5E 26AWG metal jacket cable. On the pc side I have the sabrent 2.5g usb network adapter, again connected to the other 2.5gbps port. I think sabrent usb adapter has the same controller as the Helios64.

Spoiler


PS C:> Get-NetAdapter

Name                      InterfaceDescription                    ifIndex Status       MacAddress             LinkSpeed
----                      --------------------                    ------- ------       ----------             ---------
Ethernet 2                Realtek USB 2.5GbE Family Controller         16 Up           00-24-27-88-0B-5A       2.5 Gbps
ZeroTier One [93afae59... ZeroTier Virtual Port                        15 Up           2A-AE-7A-A6-B8-55       100 Mbps
ZeroTier One [af78bf94... ZeroTier Virtual Port #2                     53 Up           62-D9-06-F3-75-44       100 Mbps
VirtualBox Host-Only N... VirtualBox Host-Only Ethernet Adapter        12 Up           0A-00-27-00-00-0C         1 Gbps
Wi-Fi                     Intel(R) Wi-Fi 6 AX200 160MHz                11 Disconnected 50-E0-85-F5-02-3A          0 bps
Ethernet                  Intel(R) I211 Gigabit Network Connec...       4 Disconnected B4-2E-99-8E-1D-2E          0 bps


PS C:> Get-NetAdapterAdvancedProperty "Ethernet 2"

Name                      DisplayName                    DisplayValue                   RegistryKeyword RegistryValue
----                      -----------                    ------------                   --------------- -------------
Ethernet 2                Energy-Efficient Ethernet      Disabled                       *EEE            {0}
Ethernet 2                Flow Control                   Rx & Tx Enabled                *FlowControl    {3}
Ethernet 2                IPv4 Checksum Offload          Rx & Tx Enabled                *IPChecksumO... {3}
Ethernet 2                Jumbo Frame                    Disabled                       *JumboPacket    {1514}
Ethernet 2                Large Send Offload v2 (IPv4)   Enabled                        *LsoV2IPv4      {1}
Ethernet 2                Large Send Offload v2 (IPv6)   Enabled                        *LsoV2IPv6      {1}
Ethernet 2                ARP Offload                    Enabled                        *PMARPOffload   {1}
Ethernet 2                NS Offload                     Enabled                        *PMNSOffload    {1}
Ethernet 2                Priority & VLAN                Priority & VLAN Enabled        *PriorityVLA... {3}
Ethernet 2                Selective suspend              Enabled                        *SelectiveSu... {1}
Ethernet 2                Speed & Duplex                 Auto Negotiation               *SpeedDuplex    {0}
Ethernet 2                SS idle timeout                5                              *SSIdleTimeout  {5}
Ethernet 2                SS idle timeout(Screen off)    3                              *SSIdleTimeo... {3}
Ethernet 2                TCP Checksum Offload (IPv4)    Rx & Tx Enabled                *TCPChecksum... {3}
Ethernet 2                TCP Checksum Offload (IPv6)    Rx & Tx Enabled                *TCPChecksum... {3}
Ethernet 2                UDP Checksum Offload (IPv4)    Rx & Tx Enabled                *UDPChecksum... {3}
Ethernet 2                UDP Checksum Offload (IPv6)    Rx & Tx Enabled                *UDPChecksum... {3}
Ethernet 2                Wake on Magic Packet           Enabled                        *WakeOnMagic... {1}
Ethernet 2                Wake on pattern match          Enabled                        *WakeOnPattern  {1}
Ethernet 2                Advanced EEE                   Disabled                       AdvancedEEE     {0}
Ethernet 2                Green Ethernet                 Enabled                        EnableGreenE... {1}
Ethernet 2                Idle Power Saving              Enabled                        EnableU2P3      {1}
Ethernet 2                Gigabit Lite                   Enabled                        GigaLite        {1}
Ethernet 2                Network Address                                               NetworkAddress  {--}
Ethernet 2                VLAN ID                        0                              VlanID          {0}
Ethernet 2                Wake on link change            Enabled                        WakeOnLinkCh... {1}
Ethernet 2                WOL & Shutdown Link Speed      10 Mbps First                  WolShutdownL... {0}

January 9, 2021

Yeah, I'm back to the 1gbps port too... Pushed over 3 TB of data on to the device from multiple sources simultaneously the other day and haven't had a single error.

With the previous stable builds it used to throw errors quite often during transfers, so when I didn't see any errors with the new one I figured the problem must have been fixed.

Today I'm loosing eth1 completely even without exerting much of a load. Maybe its because I'm pulling data from the device today instead of pushing. They are tx errors after all and not rx.

January 9, 2021

I guess your tx offload is off as well since it is set that way by default but could you try again with 'ethtool -K eth1 tx off'? Asking just in case. Got the tx resets and hang ups again when I switched it back on and initiated a large transfer to see what happens with the new build.

January 8, 2021

Well I did a fresh install of Armbian_20.11.6_Helios64_buster_current_5.9.14.img (didn't try to update) and hammered the disks and 2.5g network for about a day now and no tx resets or timeouts happened at all. No disconnects or hang ups.

Sign In

Forums

Store

Crowdfunding

Applications

Events

Raffles

Community Map

Posts posted by clostro

Important Information