

clostro
-
Posts
30 -
Joined
Content Type
Forums
Store
Crowdfunding
Applications
Events
Raffles
Community Map
Posts posted by clostro
-
-
I'm using Zyxel XGS1010-12. Has 8x1gbe, 2x2.5gbe, and 2x10gb sfp+ ports. Works ok, does what I need.
-
Can maybe disable the service and put the script it is running, /usr/bin/helios64-ups.sh to root cron for every minute. You could make it run every 20 seconds by adding a while loop that goes 3 turns and sleeps 20 seconds in each turn.
However, I don't see where this script checks to see if mains power is active. What I see is a script that will continually shutdown the device if the battery voltage is low even when the mains returns. Is that function provided by the service timer, as in the timer stops when there is mains?
If so absolutely do NOT cron helios64-ups.sh as it is. Because you won't be able to boot the device until the battery charges over 7 volts.
#!/bin/bash #7.0V 916 Recommended threshold to force shutdown system TH=916 val=$(cat '/sys/bus/iio/devices/iio:device0/in_voltage2_raw') sca=$(cat '/sys/bus/iio/devices/iio:device0/in_voltage_scale') adc=$(echo "$val * $sca / 1" | bc) if [ "$adc" -le $TH ]; then /usr/sbin/poweroff fi
There is a whole device with lots of data in there, is there any documentation on iio:device0?
edit- apparently there is
https://wiki.kobol.io/helios64/ups/#ups-status-under-linux
-
May I suggest outputting dmesg live to a network location?
I'm not sure if the serial console output is the same as 'dmesg' but if it is, you can live 'nohup &' it to any file. That way you wouldn't have to keep connected to console or ssh all the time. Just don't output it to any local file system as writing to a local file system at a crash might corrupt it and cause more problems.
nohup dmesg --follow > /network/location/folder/helios64-log.txt & 2>&1
exit
needed to have single >, and exit the session with 'exit' apparently..
-
Mine has been pretty stable for the last 2 months. Last restart was to update to Armbian 21.02.2 Buster with 5.10.16 kernel 24 days ago. I applied the Cpu freq mod for the previous kernel, and upgraded with apt, no fresh install. Cpu freq mod is still in place I assume. Device has been completely stable since that mod, and I am not undoing it for the time being. Reliability > everything else.
I'm using the 2.5G port exclusively with a 2.5G switch.
There are 4 4TB Seagate Ironwolf drives and an old (very old) Sandisk Extreme SSD in there.
No OMV or ZFS. Just LVM Raid5 with SSD rw cache.
No docker or VMs running.
Cockpit for management.
Samba for file sharing.
Borg for backups.
-
Guys, those drives are CMR (ones I could pick out from the logs posted), not SMR. EFRX is CMR, EFAX is SMR.
Also, SMR is the one to avoid at all costs.
From https://www.servethehome.com/buyers-guides/top-hardware-components-freenas-nas-servers/top-picks-freenas-hard-drives/, ServeTheHome runs a lot of ZFS pools in their infra.
QuoteAs a quick note. We do not recommend the new WD Red SMR drives. See WD Red SMR vs CMR Tested Avoid Red SMR.
Please refer to manufacturer spec sheets and tables before buying a HDD. Here are a few lists-
https://nascompares.com/answer/list-of-wd-cmr-and-smr-hard-drives-hdd/
https://blog.westerndigital.com/wd-red-nas-drives/
https://www.seagate.com/internal-hard-drives/cmr-smr-list/
As far as I know, all Hitachi drives are CMR.
-
@kelsoAny SSD will be fast enough for the network or USB speeds of this device. If you are buying new you can pick WD red, Samsung Evo, Sandisk Ultra/Extreme, Seagate firepro (?) ... just stay away from little or no known brands. You can check the models you picked and compare them here - https://ssd.userbenchmark.com/
They are getting a well deserved grilling for their CPU comparisons but I think their SSD data is good enough. I would be looking for the best 'Mixed' value for performance for use in this device, as the network and USB speeds are capping the max read or write speed anyhow.
Western Digitals you picked use CMR, which is supposedly much better than SMR, can take a look at this table if you have to pick other models. https://nascompares.com/answer/list-of-wd-cmr-and-smr-hard-drives-hdd/
One suggestion before you put any critical data into those brand new disks- Run 'smart' tests on each drive, long tests. Should take about 10 hours I think.
One of my brand new Seagates couldn't complete a test at first run and had to be replaced. Now I'm on edge and running nightly borg backups to an older NAS because the other disks are from the same series. Call me paranoid but I usually stagger HDD purchases by a few days in between, and/or order from different stores to avoid having them from the same production batch, couldn't do that this time around.
@WernerI use 4 4TB Seagate NAS drives, whatever their branding was.. And an old 240GB Sandisk Extreme SSD for Raid caching. LVM Raid 5 + dm-cache(read and write, writeback, smq). It ain't bad. SSD really picks up the slack of the spinning rust especially when you are pushing random writes to the device, and the smq is pretty smart at read caching for hot files.
-
@Gareth Halfacree If you would be interested in FreeBSD, @SleepWalker was experimenting with it some time ago. Maybe still is.
-
Just did an in place upgrade with armbian-config (was running the previous build with static cpu freq for 23 days stably), it all works fine after a reboot. Wanted to report results.
Few things -
-After armbian-config upgrade and reboot, there still were packages not up to date, had to run apt again.
-The firewalld zone didn't come up on its own, but the firewall was working as configured. I'm confused about that one. It came back up after I restarted the firewalld service I think.
-And the weirdest thing is that the zram1 log was full 100%. armbian-ramlog service was failing and wouldn't work properly until I manually deleted /var/log/pcp* since it was the one hogging all the space.
Did test the 2.5G adapter with crystaldisk on Windows client a few times, which would previously crash eth1 (sometimes near instant death), and so far it works fine.
Did some iperf3 tests, and I am getting +1.9gbps on one side and +2.2gbps on the other with 'ethtool -K eth1 rx off tx on'
edit: forgot about the eth0, it wouldn't show up in ifconfig after the update. Not sure what happened about that one since the cockpit interface shows it just fine, just not configured. It may have been me toying around with it before the update. I might have cloned its MAC address so it would get the same IP as the eth1 from pihole dhcp, really can't remember.
-
Putting aside the discussion about disk health and spin up and downs, a non spin down value might solve your issue here.
You can take a look at both -S and -B options. I couldn't figure out the difference between their 'set' values entirely. They are both supposedly setting the APM value, but aside from -S putting the drives to sleep immediately and then setting the sleep timer, they have different definitions for the level values.
From https://man7.org/linux/man-pages/man8/hdparm.8.html
For instance -B
QuotePossible settings range from values 1 through 127 (which permit spin-down), and values 128 through 254 (which do not permit spin-down). The highest degree of power management is attained with a setting of 1, and the highest I/O performance with a setting of 254. A value of 255 tells hdparm to disable Advanced Power Management altogether on the drive
and -S
QuotePut the drive into idle (low-power) mode, and also set the standby (spindown) timeout for the drive.
QuoteA value of zero means "timeouts are disabled": the device will not automatically enter standby mode. Values from 1 to 240 specify multiples of 5 seconds, yielding timeouts from 5 seconds to 20 minutes. Values from 241 to 251 specify from 1 to 11 units of 30 minutes, yielding timeouts from 30 minutes to 5.5 hours. A value of 252 signifies a timeout of 21 minutes. A value of 253 sets a vendor-defined timeout period between 8 and 12 hours, and the value 254 is reserved. 255 is interpreted as 21 minutes plus 15 seconds. Note that some older drives may have very different interpretations of these values.
As you can see, the value of 255 and other special levels are different between -S and -B. But the definitions also sounds like they are doing the same thing as well.
I would like to learn if anyone can clarify the difference.
-
Have you tried to see the sleep/spin down timers on the disks with hdparm?
hdparm -I /dev/sd[a-e] | grep level
And here is a chart for interpreting the output APM values- http://www.howtoeverything.net/linux/hardware/list-timeout-values-hdparm-s
-
Just wanted to report that the CPU frequency mod has been running stable under normal use for 15 days now (on 1Gbe connection). Haven't tried the voltage mod.
I'll switch to the February 4th Buster 5.10 build soon.
edit: 23 days, and I shut it down for sd card backup and system update. cpu freq mod is rock solid.
-
Hi @aprayoga
I have a question before I try modifying boot.scr -
I tried @gprovost's suggestion about the CPU speeds and the device has been running stable for nearly 9 days now. I was waiting for 14 days to test its stability and then update Armbian.
The current Armbian running on the device is Linux helios64 5.9.14-rockchip64 #20.11.4 SMP PREEMPT Tue Dec 15 08:52:20 CET 2020 aarch64 GNU/Linux. But I have been meaning to update to Armbian_20.11.10_Helios64_buster_current_5.9.14.img.xz.
- Shall I try this mod on top of the current setup with the old Armbian and CPU speed mod? Or can I update to the newest image?
- If I update to the latest image, can I just update in place or do you suggest a fresh install?
- Also shall I also redo the CPU speed mod as well? After a fresh install.
Thanks
-
Thanks for the update, I really hope its not the cables in my case. I mean I was not getting these lines in the log before, just got them for the first time.
The only difference from the last few boots is the CPU frequency and speed governor per the quote below.
I don't think they are related, this was originally suggested for troubleshooting a 'sync and drop_caches' issue, which works fine on my device.
Later it was also suggested for the 'mystery red light' issue, which was a problem on my device.
But this could be something else.
Hopefully not the cables, I would rather have the SSD connected to that slot fail than to change the harness.
On 1/4/2021 at 6:48 AM, gprovost said:Hi, could try to the following tweak and redo the same methods that triggers the crash :
Run armbian-config, go to -> System -> CPU
And set:
Minimum CPU speed = 1200000
Maximum CPU speed = 1200000
CPU governor = performance
This will help us understand if instability is still due to DVFS.
-
Just started seeing these 'ata1.00: failed command: READ FPDMA QUEUED' lines in my log as well. I assume ata1 is sda.
Had a whole bunch of them a while after boot and nothing afterwards. But the device was not accessed a whole lot during this time. It just booted up after a crash and the LVM cache was cleaning the dirty bits on the cache SSD connected to sda.
sdb-e are populated with 4x4TB ST4000VN008 Ironwolfs, and sda is hooked up to an old (and I mean old) Sandisk Extreme 240GB SSD SDSSDX240GG25.
I attached the smartctl report for the SSD below, and it passed a short smart test just now. I'll start a long test in a minute.
Using Linux helios64 5.9.14-rockchip64 #20.11.4 SMP PREEMPT Tue Dec 15 08:52:20 CET 2020 aarch64 GNU/Linux.
Serial number of the board is 000100000530 if that could help.
Spoiler[ 4101.251887] ata1.00: exception Emask 0x10 SAct 0xffffffff SErr 0xb80100 action 0x6 [ 4101.251895] ata1.00: irq_stat 0x08000000 [ 4101.251903] ata1: SError: { UnrecovData 10B8B Dispar BadCRC LinkSeq } [ 4101.251911] ata1.00: failed command: READ FPDMA QUEUED [ 4101.251924] ata1.00: cmd 60/00:00:00:02:b6/02:00:08:00:00/40 tag 0 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.251929] ata1.00: status: { DRDY } [ 4101.251934] ata1.00: failed command: READ FPDMA QUEUED [ 4101.251946] ata1.00: cmd 60/00:08:00:a0:b3/02:00:08:00:00/40 tag 1 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.251950] ata1.00: status: { DRDY } [ 4101.251956] ata1.00: failed command: READ FPDMA QUEUED [ 4101.251968] ata1.00: cmd 60/00:10:00:7c:b3/02:00:08:00:00/40 tag 2 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.251972] ata1.00: status: { DRDY } [ 4101.251977] ata1.00: failed command: READ FPDMA QUEUED [ 4101.251989] ata1.00: cmd 60/00:18:00:58:b3/02:00:08:00:00/40 tag 3 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.251993] ata1.00: status: { DRDY } [ 4101.251998] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252010] ata1.00: cmd 60/00:20:00:34:b3/02:00:08:00:00/40 tag 4 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252014] ata1.00: status: { DRDY } [ 4101.252020] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252032] ata1.00: cmd 60/00:28:00:3e:b1/02:00:08:00:00/40 tag 5 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252036] ata1.00: status: { DRDY } [ 4101.252041] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252053] ata1.00: cmd 60/00:30:00:1a:b1/02:00:08:00:00/40 tag 6 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252057] ata1.00: status: { DRDY } [ 4101.252062] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252074] ata1.00: cmd 60/00:38:00:f6:b0/02:00:08:00:00/40 tag 7 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252078] ata1.00: status: { DRDY } [ 4101.252083] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252095] ata1.00: cmd 60/00:40:00:d2:b0/02:00:08:00:00/40 tag 8 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252099] ata1.00: status: { DRDY } [ 4101.252105] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252116] ata1.00: cmd 60/00:48:00:ae:b0/02:00:08:00:00/40 tag 9 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252120] ata1.00: status: { DRDY } [ 4101.252126] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252138] ata1.00: cmd 60/00:50:00:b8:ae/02:00:08:00:00/40 tag 10 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252142] ata1.00: status: { DRDY } [ 4101.252147] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252159] ata1.00: cmd 60/00:58:00:94:ae/02:00:08:00:00/40 tag 11 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252163] ata1.00: status: { DRDY } [ 4101.252169] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252181] ata1.00: cmd 60/00:60:00:b8:bd/02:00:08:00:00/40 tag 12 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252185] ata1.00: status: { DRDY } [ 4101.252190] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252202] ata1.00: cmd 60/00:68:00:94:bd/02:00:08:00:00/40 tag 13 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252206] ata1.00: status: { DRDY } [ 4101.252211] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252223] ata1.00: cmd 60/00:70:00:70:bd/02:00:08:00:00/40 tag 14 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252228] ata1.00: status: { DRDY } [ 4101.252232] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252244] ata1.00: cmd 60/00:78:00:4c:bd/02:00:08:00:00/40 tag 15 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252249] ata1.00: status: { DRDY } [ 4101.252254] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252266] ata1.00: cmd 60/00:80:00:56:bb/02:00:08:00:00/40 tag 16 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252270] ata1.00: status: { DRDY } [ 4101.252276] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252288] ata1.00: cmd 60/00:88:00:32:bb/02:00:08:00:00/40 tag 17 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252292] ata1.00: status: { DRDY } [ 4101.252297] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252309] ata1.00: cmd 60/00:90:00:0e:bb/02:00:08:00:00/40 tag 18 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252314] ata1.00: status: { DRDY } [ 4101.252319] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252330] ata1.00: cmd 60/00:98:00:ea:ba/02:00:08:00:00/40 tag 19 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252335] ata1.00: status: { DRDY } [ 4101.252340] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252352] ata1.00: cmd 60/00:a0:00:c6:ba/02:00:08:00:00/40 tag 20 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252356] ata1.00: status: { DRDY } [ 4101.252361] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252373] ata1.00: cmd 60/00:a8:00:d0:b8/02:00:08:00:00/40 tag 21 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252377] ata1.00: status: { DRDY } [ 4101.252382] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252394] ata1.00: cmd 60/00:b0:00:ac:b8/02:00:08:00:00/40 tag 22 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252399] ata1.00: status: { DRDY } [ 4101.252404] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252416] ata1.00: cmd 60/00:b8:00:88:b8/02:00:08:00:00/40 tag 23 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252420] ata1.00: status: { DRDY } [ 4101.252426] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252437] ata1.00: cmd 60/00:c0:00:64:b8/02:00:08:00:00/40 tag 24 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252442] ata1.00: status: { DRDY } [ 4101.252447] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252459] ata1.00: cmd 60/00:c8:00:40:b8/02:00:08:00:00/40 tag 25 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252463] ata1.00: status: { DRDY } [ 4101.252468] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252480] ata1.00: cmd 60/00:d0:00:4a:b6/02:00:08:00:00/40 tag 26 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252485] ata1.00: status: { DRDY } [ 4101.252490] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252502] ata1.00: cmd 60/00:d8:00:26:b6/02:00:08:00:00/40 tag 27 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252506] ata1.00: status: { DRDY } [ 4101.252511] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252523] ata1.00: cmd 60/00:e0:00:de:b5/02:00:08:00:00/40 tag 28 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252527] ata1.00: status: { DRDY } [ 4101.252532] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252545] ata1.00: cmd 60/00:e8:00:ba:b5/02:00:08:00:00/40 tag 29 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252549] ata1.00: status: { DRDY } [ 4101.252554] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252566] ata1.00: cmd 60/00:f0:00:c4:b3/02:00:08:00:00/40 tag 30 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252570] ata1.00: status: { DRDY } [ 4101.252575] ata1.00: failed command: READ FPDMA QUEUED [ 4101.252587] ata1.00: cmd 60/00:f8:00:70:ae/02:00:08:00:00/40 tag 31 ncq dma 262144 in res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error) [ 4101.252592] ata1.00: status: { DRDY } [ 4101.252603] ata1: hard resetting link [ 4101.727761] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 4101.749701] ata1.00: configured for UDMA/133 [ 4101.752101] ata1: EH complete
Spoilersmartctl -x /dev/sda smartctl 6.6 2017-11-05 r4594 [aarch64-linux-5.9.14-rockchip64] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: SandForce Driven SSDs Device Model: SanDisk SDSSDX240GG25 Serial Number: 123273400127 LU WWN Device Id: 5 001b44 7bf3fcb3f Firmware Version: R201 User Capacity: 240,057,409,536 bytes [240 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS, ACS-2 T13/2015-D revision 3 SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Wed Jan 27 13:24:04 2021 +03 SMART support is: Available - device has SMART capability. SMART support is: Enabled AAM feature is: Unavailable APM level is: 254 (maximum performance) Rd look-ahead is: Disabled Write cache is: Enabled DSN feature is: Unavailable ATA Security is: Disabled, NOT FROZEN [SEC1] Wt Cache Reorder: Unavailable === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x02) Offline data collection activity was completed without error. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 0) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 48) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x0021) SCT Status supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-- 119 119 050 - 0/225624030 5 Retired_Block_Count PO--CK 100 100 003 - 7 9 Power_On_Hours_and_Msec -O--CK 000 000 000 - 30893h+27m+05.090s 12 Power_Cycle_Count -O--CK 099 099 000 - 1951 171 Program_Fail_Count -O--CK 000 000 000 - 0 172 Erase_Fail_Count -O--CK 000 000 000 - 0 174 Unexpect_Power_Loss_Ct ----CK 000 000 000 - 999 177 Wear_Range_Delta ------ 000 000 000 - 3 181 Program_Fail_Count -O--CK 000 000 000 - 0 182 Erase_Fail_Count -O--CK 000 000 000 - 0 187 Reported_Uncorrect -O--CK 100 100 000 - 0 194 Temperature_Celsius -O---K 028 055 000 - 28 (Min/Max 10/55) 195 ECC_Uncorr_Error_Count --SRC- 120 120 000 - 0/225624030 196 Reallocated_Event_Count PO--CK 100 100 003 - 7 201 Unc_Soft_Read_Err_Rate --SRC- 120 120 000 - 0/225624030 204 Soft_ECC_Correct_Rate --SRC- 120 120 000 - 0/225624030 230 Life_Curve_Status PO--C- 100 100 000 - 100 231 SSD_Life_Left PO--C- 100 100 010 - 0 233 SandForce_Internal ------ 000 000 000 - 10541 234 SandForce_Internal -O--CK 000 000 000 - 13196 241 Lifetime_Writes_GiB -O--CK 000 000 000 - 13196 242 Lifetime_Reads_GiB -O--CK 000 000 000 - 13491 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning General Purpose Log Directory Version 1 SMART Log Directory Version 1 [multi-sector log support] Address Access R/W Size Description 0x00 GPL,SL R/O 1 Log Directory 0x07 GPL R/O 1 Extended self-test log 0x09 SL R/W 1 Selective self-test log 0x10 GPL R/O 1 NCQ Command Error log 0x11 GPL,SL R/O 1 SATA Phy Event Counters log 0x80-0x9f GPL,SL R/W 16 Host vendor specific log 0xb7 GPL,SL VS 16 Device vendor specific log 0xe0 GPL,SL R/W 1 SCT Command/Status 0xe1 GPL,SL R/W 1 SCT Data Transfer SMART Extended Comprehensive Error Log (GP Log 0x03) not supported SMART Error Log not supported SMART Extended Self-test Log Version: 0 (1 sectors) No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 0 Note: revision number not 1 implies that no selective self-test has ever been run SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. SCT Status Version: 3 SCT Version (vendor specific): 0 (0x0000) SCT Support Level: 1 Device State: Active (0) Current Temperature: 28 Celsius Power Cycle Min/Max Temperature: 27/31 Celsius Lifetime Min/Max Temperature: 10/55 Celsius Under/Over Temperature Limit Count: 0/0 SCT Temperature History Version: 10 (Unknown, should be 2) Temperature Sampling Period: 1 minute Temperature Logging Interval: 10 minutes Min/Max recommended Temperature: 0/120 Celsius Min/Max Temperature Limit: 0/ 0 Celsius Temperature History Size (Index): 56576 (7) Invalid Temperature History Size or Index SCT Error Recovery Control command not supported Device Statistics (GP/SMART Log 0x04) not supported Pending Defects log (GP Log 0x0c) not supported SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x0001 2 0 Command failed due to ICRC error 0x0003 2 0 R_ERR response for device-to-host data FIS 0x0004 2 0 R_ERR response for host-to-device data FIS 0x0006 2 0 R_ERR response for device-to-host non-data FIS 0x0007 2 0 R_ERR response for host-to-device non-data FIS 0x0008 2 0 Device-to-host non-data FIS retries 0x0009 2 4 Transition from drive PhyRdy to drive PhyNRdy 0x000a 2 4 Device-to-host register FISes sent due to a COMRESET 0x000f 2 0 R_ERR response for host-to-device data FIS, CRC 0x0010 2 0 R_ERR response for host-to-device data FIS, non-CRC 0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC 0x0013 2 0 R_ERR response for host-to-device non-data FIS, non-CRC 0x0002 2 0 R_ERR response for data FIS 0x0005 2 0 R_ERR response for non-data FIS 0x000b 2 0 CRC errors within host-to-device FIS 0x000d 2 0 Non-CRC errors within host-to-device FIS
-
Just had another one right now. Previous one was 3 days ago.
So I plugged the serial cable but the device was completely gone. No login, or any output.
Also there are no logs or anything about previous boot once rebooted either.
journalctl --list-boots lists only the current boot, and dmesg -E is empty.
Anyhow, trying the CPU speed thing now. I hope it works, cleaning the entire SSD cache that gets dirty at every crash is worrying me.
-
- I don't see the point of baking a m.2 or any future nvme PCI-e port on the board honestly. Instead I would love to see a PCI-e slot or two exposed. That way end user can pick what expansion is needed. It can be an nvme adapter, can be 10gbe network, or SATA or USB or even a SAS controller for additional external enclosures. It will probably be more expensive for the end user but will open up a larger market for the device. We already have 2 lanes/10gbps worth of PCIe not utilized as is (correct me on this).
- Absolutely more/user upgradeable and ECC ram support.
- Better trays, you guys they are a pain right now lol. ( Loving the black and purple though )
- Some sort of active/quiet cooling on the SoC separate from the disks, as any disk activity over the network or even a simple CPU bound task ramps up all the fans immediately.
- Most important of all, please do not stray too far from the current architecture both hardware and firmware wise. Aside from everything else, we would really love to see a rock solid launch next time around. Changing a lot of things might cause most of your hard work to-date to mean nothing.
-
Just had the same thing happen to my unit. It was the first time I haven't manually rebooted the system for some reason for about 10 days I think.
-
@allen--smithee I can see how these options could be useful to others, but in my case, all I needed was the original config. 4 HDD raid5 + 1 SSD writeback cache
I would have been forced to get a QNAP if I needed more.
But I'll play
I would pick none, here is my counter offer-
A- Replace the m.2 slot with a PMP enabled eSATA port on the back for a low end disk tower to be attached for bulk/slow storage. There are generic 8 slot (no PMP over 5, those are raid towers but still eSATA connected) towers available right now. Added bonus is more space on PCB for something.
OR
B- If possible expose a PCI-e slot or two for adding in a SFP+ and a mini SAS card for a fast disk tower and 10gbe network to be connected. This probably would require more ram, more powerful cpu, and more watts, and all that jazz. Not sure if this is possible in ARM arch available to us mere mortals yet. Obviously Amazon and others have their datacenter ARM cpus that has all these and then some. (I just realized after writing this that I explained a xeon QNAP unit... but they sell, sooo maybe good?)
They could design and sell these disk towers and the add in cards as well. Currently available SAS and eSata towers are quite drab and blend to look at. Attractive Helios64 design aesthetic would make them look very pretty.
Also, this way the original helios64 box that is suited to home use doesn't get bulkier, its design is mostly unchanged so the production troubles that are solved won't be scratched but built upon. I remember from the blog comments of the Kobol guys that the enclosure factory was problematic and caused some delays at the end. Multiple PCBs are not produced for each use case and the user is free to add in options.
In my opinion, A) eSATA port on the back and a Kobol designed disk enclosure would be perfect for most. And a good incremental way forward.
3 disk ZFS or RAID5 + 1 SSD for L2ARC/SLOG or dmcache for fast daily work space
and
5/8 disks for daily backups / archives / any slow storage needs.
@gprovost Thank you for renaming the thread, I was feeling bad for posting at and keeping it active.
-
12 minutes ago, Werner said:
Feel free to take a look into. Curious about that too.
Couldn't find anything in the Helios64 patch or other RK3399 that relates to port multiplication.
Not sure I'm looking at this correctly but when I search the repo for 'pmp' I'm seeing a lot of results in config/kernel folder with
CONFIG_SATA_PMP=y
Extrapolating those search results, I think I can assume that port multiplication is off by default in Armbian, and needs to be enabled for that specific SoC or device. No?
But later I found this on https://wiki.kobol.io/helios64/sata/
QuoteSATA Controller Features¶
5 SATA ports,
Command-based and FIS-based for Port Multiplier,
Compliance with SATA Specification Revision 3.2,
AHCI mode and IDE programming interface,
Native Command Queue (NCQ),
SATA link power saving mode (partial and slumber),
SATA plug-in detection capable,
Drive power control and staggered spin-up,
SATA Partial / Slumber power management state.
So apparently port multipliers should be supported on Helios64, or will be supported? Honestly I don't think they would be very useful since the case has to be modified and stuff. I first mentioned it as a joke to begin with. I don't think anyone would go for this as the case itself is a cool piece of design and butchering it would be a shame.
-
https://man7.org/linux/man-pages/man7/lvmcache.7.html
https://linuxhint.com/configuring-zfs-cache/
Here you go... use 4 HDDs and an SSD cache.
Or sell your unit, quite a lot of people wanted to buy one and couldn't in time.
OR, Frankenstein your unit and add in port multipliers to original SATA ports. Can add up to 20 HDDs to the 4 original SATA ports, and up to 5 SSDs to the remaining 1 original SATA port. The hardware controller specs say that it supports port multipliers, not sure about Armbian kernel, might have to modify.
Btw, you can take a look at the new version of Odroid H2+ with port multipliers (up to 10 SATA disks, + PCI-e Nvme) if you are into more tinkering. Also you get two 2.5G network ports instead of one. Hardkernel team has a blog post about its setup and benchmarks. https://wiki.odroid.com/odroid-h2/application_note/10_sata_drives
I am planning to expand my network infra with the H2+ soon. You can even plug a $45 4 port 2.5g switch into the Nvme slot now. I'm going crazy about this unit. If only it didn't have a shintel (r/AyyMD) cpu in.
Anyhow-
Just doing a bit of research shows that this was not a 'decision' made by Kobol, not exactly. There are just two reliable PCI-e to SATA controllers I could find that support multiple SATA ports(+4), with the limitation of RK3399 which has 4 PCI-e lanes. It would be a different story if RK had 8 lanes, but that is another can of worms that includes cpu arch, form factor, etc. Not gonna open that can while I'm barely qualified to open this one.
What we have here in Helios64 is JMB585, and then the other option was Marvel 88SE9215. Marvel only supports 4 ports, while JMB supports 5. There are no controllers that work reliably with 4 PCI-e 2.1 lanes and have more than 5 ports that I could find.
There is the quite new Asmedia ASM1166 which actually supports 6 ports, but that was probably not available during design of Helios64 as it is quite new. Not only that, there is a weird thread about its log output on Unraid forums.
At the end, this '5 ports only' was not exactly a decision for Kobol. But a rather uninformed decision made by you. You don't see people here complaining about this. You are the second one who even mentioned it, and the only one who is complaining so far with CAPS no less. Which means that the specs were clear to pretty much everyone that this was the case.
My suggestion is to replace one of the drives with a Samsung 860 pro, make it a SLOG/L2ARC, or in my case a LVM cache (write back mode, make sure you have a proper UPS, or the battery in the unit is connected properly) and call it a day. SATA port is faster than the 2.5G ethernet or the USB DAS mode anyhow so your cache SSD will mostly perform ok.
-
@gprovost Hi, my Helios64 is connected to a Zyxel XGS1010-12 switch on its 2.5gbps ports via a CAT5E 26AWG metal jacket cable. On the pc side I have the sabrent 2.5g usb network adapter, again connected to the other 2.5gbps port. I think sabrent usb adapter has the same controller as the Helios64.
SpoilerPS C:> Get-NetAdapter Name InterfaceDescription ifIndex Status MacAddress LinkSpeed ---- -------------------- ------- ------ ---------- --------- Ethernet 2 Realtek USB 2.5GbE Family Controller 16 Up 00-24-27-88-0B-5A 2.5 Gbps ZeroTier One [93afae59... ZeroTier Virtual Port 15 Up 2A-AE-7A-A6-B8-55 100 Mbps ZeroTier One [af78bf94... ZeroTier Virtual Port #2 53 Up 62-D9-06-F3-75-44 100 Mbps VirtualBox Host-Only N... VirtualBox Host-Only Ethernet Adapter 12 Up 0A-00-27-00-00-0C 1 Gbps Wi-Fi Intel(R) Wi-Fi 6 AX200 160MHz 11 Disconnected 50-E0-85-F5-02-3A 0 bps Ethernet Intel(R) I211 Gigabit Network Connec... 4 Disconnected B4-2E-99-8E-1D-2E 0 bps PS C:> Get-NetAdapterAdvancedProperty "Ethernet 2" Name DisplayName DisplayValue RegistryKeyword RegistryValue ---- ----------- ------------ --------------- ------------- Ethernet 2 Energy-Efficient Ethernet Disabled *EEE {0} Ethernet 2 Flow Control Rx & Tx Enabled *FlowControl {3} Ethernet 2 IPv4 Checksum Offload Rx & Tx Enabled *IPChecksumO... {3} Ethernet 2 Jumbo Frame Disabled *JumboPacket {1514} Ethernet 2 Large Send Offload v2 (IPv4) Enabled *LsoV2IPv4 {1} Ethernet 2 Large Send Offload v2 (IPv6) Enabled *LsoV2IPv6 {1} Ethernet 2 ARP Offload Enabled *PMARPOffload {1} Ethernet 2 NS Offload Enabled *PMNSOffload {1} Ethernet 2 Priority & VLAN Priority & VLAN Enabled *PriorityVLA... {3} Ethernet 2 Selective suspend Enabled *SelectiveSu... {1} Ethernet 2 Speed & Duplex Auto Negotiation *SpeedDuplex {0} Ethernet 2 SS idle timeout 5 *SSIdleTimeout {5} Ethernet 2 SS idle timeout(Screen off) 3 *SSIdleTimeo... {3} Ethernet 2 TCP Checksum Offload (IPv4) Rx & Tx Enabled *TCPChecksum... {3} Ethernet 2 TCP Checksum Offload (IPv6) Rx & Tx Enabled *TCPChecksum... {3} Ethernet 2 UDP Checksum Offload (IPv4) Rx & Tx Enabled *UDPChecksum... {3} Ethernet 2 UDP Checksum Offload (IPv6) Rx & Tx Enabled *UDPChecksum... {3} Ethernet 2 Wake on Magic Packet Enabled *WakeOnMagic... {1} Ethernet 2 Wake on pattern match Enabled *WakeOnPattern {1} Ethernet 2 Advanced EEE Disabled AdvancedEEE {0} Ethernet 2 Green Ethernet Enabled EnableGreenE... {1} Ethernet 2 Idle Power Saving Enabled EnableU2P3 {1} Ethernet 2 Gigabit Lite Enabled GigaLite {1} Ethernet 2 Network Address NetworkAddress {--} Ethernet 2 VLAN ID 0 VlanID {0} Ethernet 2 Wake on link change Enabled WakeOnLinkCh... {1} Ethernet 2 WOL & Shutdown Link Speed 10 Mbps First WolShutdownL... {0}
-
Yeah, I'm back to the 1gbps port too... Pushed over 3 TB of data on to the device from multiple sources simultaneously the other day and haven't had a single error.
With the previous stable builds it used to throw errors quite often during transfers, so when I didn't see any errors with the new one I figured the problem must have been fixed.
Today I'm loosing eth1 completely even without exerting much of a load. Maybe its because I'm pulling data from the device today instead of pushing. They are tx errors after all and not rx.
-
I guess your tx offload is off as well since it is set that way by default but could you try again with 'ethtool -K eth1 tx off'? Asking just in case. Got the tx resets and hang ups again when I switched it back on and initiated a large transfer to see what happens with the new build.
-
Well I did a fresh install of Armbian_20.11.6_Helios64_buster_current_5.9.14.img (didn't try to update) and hammered the disks and 2.5g network for about a day now and no tx resets or timeouts happened at all. No disconnects or hang ups.
How to start over with configuring the SBC from an SD card image of Armbian?
in Rockchip
Posted
https://wiki.kobol.io/helios64/button/#recovery-button
Usb type-c to emmc flashing according to wiki. I'm not sure how it works though, never tried it.