Jump to content

clostro

Members
  • Posts

    30
  • Joined

Posts posted by clostro

  1. Can maybe disable the service and put the script it is running, /usr/bin/helios64-ups.sh to root cron for every minute. You could make it run every 20 seconds by adding a while loop that goes 3 turns and sleeps 20 seconds in each turn.

     

    However, I don't see where this script checks to see if mains power is active. What I see is a script that will continually shutdown the device if the battery voltage is low even when the mains returns. Is that function provided by the service timer, as in the timer stops when there is mains?

    If so absolutely do NOT cron helios64-ups.sh as it is. Because you won't be able to boot the device until the battery charges over 7 volts.

     

    #!/bin/bash
    
    #7.0V   916     Recommended threshold to force shutdown system
    TH=916
    
    val=$(cat '/sys/bus/iio/devices/iio:device0/in_voltage2_raw')
    sca=$(cat '/sys/bus/iio/devices/iio:device0/in_voltage_scale')
    adc=$(echo "$val * $sca / 1" | bc)
    
    if [ "$adc" -le $TH ]; then
        /usr/sbin/poweroff
    fi

     

    There is a whole device with lots of data in there, is there any documentation on iio:device0?

     

    edit- apparently there is :D https://wiki.kobol.io/helios64/ups/#ups-status-under-linux

     

  2. May I suggest outputting dmesg live to a network location?

    I'm not sure if the serial console output is the same as 'dmesg' but if it is, you can live 'nohup &' it to any file. That way you wouldn't have to keep connected to console or ssh all the time. Just don't output it to any local file system as writing to a local file system at a crash might corrupt it and cause more problems.

     

    nohup dmesg --follow > /network/location/folder/helios64-log.txt & 2>&1

    exit

     

    needed to have single >, and exit the session with 'exit' apparently..

  3. Mine has been pretty stable for the last 2 months. Last restart was to update to Armbian 21.02.2 Buster with 5.10.16 kernel 24 days ago. I applied the Cpu freq mod for the previous kernel, and upgraded with apt, no fresh install. Cpu freq mod is still in place I assume. Device has been completely stable since that mod, and I am not undoing it for the time being. Reliability > everything else.

     

    I'm using the 2.5G port exclusively with a 2.5G switch.

    There are 4 4TB Seagate Ironwolf drives and an old (very old) Sandisk Extreme SSD in there.

    No OMV or ZFS. Just LVM Raid5 with SSD rw cache.

    No docker or VMs running.

    Cockpit for management.

    Samba for file sharing.

    Borg for backups.

     

     

     

  4. Guys, those drives are CMR (ones I could pick out from the logs posted), not SMR. EFRX is CMR, EFAX is SMR. 

    Also, SMR is the one to avoid at all costs.

     

    From https://www.servethehome.com/buyers-guides/top-hardware-components-freenas-nas-servers/top-picks-freenas-hard-drives/, ServeTheHome runs a lot of ZFS pools in their infra.

    Quote

    As a quick note. We do not recommend the new WD Red SMR drives. See WD Red SMR vs CMR Tested Avoid Red SMR.

     

    Please refer to manufacturer spec sheets and tables before buying a HDD. Here are a few lists-

    https://nascompares.com/answer/list-of-wd-cmr-and-smr-hard-drives-hdd/

    https://blog.westerndigital.com/wd-red-nas-drives/

    https://www.seagate.com/internal-hard-drives/cmr-smr-list/

     

    As far as I know, all Hitachi drives are CMR.

  5. @kelsoAny SSD will be fast enough for the network or USB speeds of this device. If you are buying new you can pick WD red, Samsung Evo, Sandisk Ultra/Extreme, Seagate firepro (?) ... just stay away from little or no known brands. You can check the models you picked and compare them here - https://ssd.userbenchmark.com/

    They are getting a well deserved grilling for their CPU comparisons but I think their SSD data is good enough. I would be looking for the best 'Mixed' value for performance for use in this device, as the network and USB speeds are capping the max read or write speed anyhow.

     

    Western Digitals you picked use CMR, which is supposedly much better than SMR, can take a look at this table if you have to pick other models. https://nascompares.com/answer/list-of-wd-cmr-and-smr-hard-drives-hdd/

     

    One suggestion before you put any critical data into those brand new disks- Run 'smart' tests on each drive, long tests. Should take about 10 hours I think. 

     

    One of my brand new Seagates couldn't complete a test at first run and had to be replaced. Now I'm on edge and running nightly borg backups to an older NAS because the other disks are from the same series. Call me paranoid but I usually stagger HDD purchases by a few days in between, and/or order from different stores to avoid having them from the same production batch, couldn't do that this time around.

     

     

    @WernerI use 4 4TB Seagate NAS drives, whatever their branding was.. And an old 240GB Sandisk Extreme SSD for Raid caching. LVM Raid 5 + dm-cache(read and write, writeback, smq). It ain't bad. SSD really picks up the slack of the spinning rust especially when you are pushing random writes to the device, and the smq is pretty smart at read caching for hot files.

  6. Just did an in place upgrade with armbian-config (was running the previous build with static cpu freq for 23 days stably), it all works fine after a reboot. Wanted to report results.

    Few things -

    -After armbian-config upgrade and reboot, there still were packages not up to date, had to run apt again.

    -The firewalld zone didn't come up on its own, but the firewall was working as configured. I'm confused about that one. It came back up after I restarted the firewalld service I think.

    -And the weirdest thing is that the zram1 log was full 100%. armbian-ramlog service was failing and wouldn't work properly until I manually deleted /var/log/pcp* since it was the one hogging all the space.

     

    Did test the 2.5G adapter with crystaldisk on Windows client a few times, which would previously crash eth1 (sometimes near instant death), and so far it works fine.

    Did some iperf3 tests, and I am getting +1.9gbps on one side and +2.2gbps on the other with 'ethtool -K eth1 rx off tx on'

     

    edit: forgot about the eth0, it wouldn't show up in ifconfig after the update. Not sure what happened about that one since the cockpit interface shows it just fine, just not configured. It may have been me toying around with it before the update. I might have cloned its MAC address so it would get the same IP as the eth1 from pihole dhcp, really can't remember.

  7. Putting aside the discussion about disk health and spin up and downs, a non spin down value might solve your issue here.

     

    You can take a look at both -S and -B options. I couldn't figure out the difference between their 'set' values entirely.  They are both supposedly setting the APM value, but aside from -S putting the drives to sleep immediately and then setting the sleep timer, they have different definitions for the level values.

     

    From https://man7.org/linux/man-pages/man8/hdparm.8.html

    For instance -B 

    Quote

    Possible settings range from values 1 through 127 (which permit spin-down), and values 128 through 254 (which do not permit spin-down). The highest degree of power management is attained with a setting of 1, and the highest I/O performance with a setting of 254. A value of 255 tells hdparm to disable Advanced Power Management altogether on the drive

     

    and -S 

    Quote

    Put the drive into idle (low-power) mode, and also set the standby (spindown) timeout for the drive.

    Quote

    A value of zero means "timeouts are disabled": the device will not automatically enter standby mode. Values from 1 to 240 specify multiples of 5 seconds, yielding timeouts from 5 seconds to 20 minutes. Values from 241 to 251 specify from 1 to 11 units of 30 minutes, yielding timeouts from 30 minutes to 5.5 hours. A value of 252 signifies a timeout of 21 minutes. A value of 253 sets a vendor-defined timeout period between 8 and 12 hours, and the value 254 is reserved. 255 is interpreted as 21 minutes plus 15 seconds. Note that some older drives may have very different interpretations of these values.

     

     

    As you can see, the value of 255 and other special levels are different between -S and -B. But the definitions also sounds like they are doing the same thing as well.

    I would like to learn if anyone can clarify the difference.

  8. Just wanted to report that the CPU frequency mod has been running stable under normal use for 15 days now (on 1Gbe connection). Haven't tried the voltage mod.

     

    I'll switch to the February 4th Buster 5.10 build soon.

     

     

    edit: 23 days, and I shut it down for sd card backup and system update. cpu freq mod is rock solid.

  9. Hi @aprayoga 

    I have a question before I try modifying boot.scr -

     

    I tried @gprovost's suggestion about the CPU speeds and the device has been running stable for nearly 9 days now.  I was waiting for 14 days to test its stability and then update Armbian.

     

    The current Armbian running on the device is Linux helios64 5.9.14-rockchip64 #20.11.4 SMP PREEMPT Tue Dec 15 08:52:20 CET 2020 aarch64 GNU/Linux. But I have been meaning to update to Armbian_20.11.10_Helios64_buster_current_5.9.14.img.xz.

     

    - Shall I try this mod on top of the current setup with the old Armbian and CPU speed mod? Or can I update to the newest image?

     

    - If I update to the latest image, can I just update in place or do you suggest a fresh install?

     

    - Also shall I also redo the CPU speed mod as well? After a fresh install.

     

    Thanks

  10. Thanks for the update, I really hope its not the cables in my case. I mean I was not getting these lines in the log before, just got them for the first time.

     

    The only difference from the last few boots is the CPU frequency and speed governor per the quote below.

     

    I don't think they are related, this was originally suggested for troubleshooting a 'sync and drop_caches' issue, which works fine on my device.

    Later it was also suggested for the 'mystery red light' issue, which was a problem on my device.

    But this could be something else.

     

    Hopefully not the cables, I would rather have the SSD connected to that slot fail than to change the harness.

     

    On 1/4/2021 at 6:48 AM, gprovost said:

    Hi, could try to the following tweak and redo the same methods that triggers the crash :

     

    Run armbian-config, go to -> System -> CPU

     

    And set:

    Minimum CPU speed = 1200000

    Maximum CPU speed = 1200000

    CPU governor = performance

     

    This will help us understand if instability is still due to DVFS.

     

  11. Just started seeing these 'ata1.00: failed command: READ FPDMA QUEUED' lines in my log as well. I assume ata1 is sda.

     

    Had a whole bunch of them a while after boot and nothing afterwards. But the device was not accessed a whole lot during this time. It just booted up after a crash and the LVM cache was cleaning the dirty bits on the cache SSD connected to sda.

     

    sdb-e are populated with 4x4TB ST4000VN008 Ironwolfs, and sda is hooked up to an old (and I mean old) Sandisk Extreme 240GB SSD SDSSDX240GG25.

    I attached the smartctl report for the SSD below, and it passed a short smart test just now. I'll start a long test in a minute.

     

    Using Linux helios64 5.9.14-rockchip64 #20.11.4 SMP PREEMPT Tue Dec 15 08:52:20 CET 2020 aarch64 GNU/Linux.

     

    Serial number of the board is 000100000530 if that could help. 

     

    Spoiler
    
    [ 4101.251887] ata1.00: exception Emask 0x10 SAct 0xffffffff SErr 0xb80100 action 0x6
    [ 4101.251895] ata1.00: irq_stat 0x08000000
    [ 4101.251903] ata1: SError: { UnrecovData 10B8B Dispar BadCRC LinkSeq }
    [ 4101.251911] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.251924] ata1.00: cmd 60/00:00:00:02:b6/02:00:08:00:00/40 tag 0 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.251929] ata1.00: status: { DRDY }
    [ 4101.251934] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.251946] ata1.00: cmd 60/00:08:00:a0:b3/02:00:08:00:00/40 tag 1 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.251950] ata1.00: status: { DRDY }
    [ 4101.251956] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.251968] ata1.00: cmd 60/00:10:00:7c:b3/02:00:08:00:00/40 tag 2 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.251972] ata1.00: status: { DRDY }
    [ 4101.251977] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.251989] ata1.00: cmd 60/00:18:00:58:b3/02:00:08:00:00/40 tag 3 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.251993] ata1.00: status: { DRDY }
    [ 4101.251998] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252010] ata1.00: cmd 60/00:20:00:34:b3/02:00:08:00:00/40 tag 4 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252014] ata1.00: status: { DRDY }
    [ 4101.252020] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252032] ata1.00: cmd 60/00:28:00:3e:b1/02:00:08:00:00/40 tag 5 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252036] ata1.00: status: { DRDY }
    [ 4101.252041] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252053] ata1.00: cmd 60/00:30:00:1a:b1/02:00:08:00:00/40 tag 6 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252057] ata1.00: status: { DRDY }
    [ 4101.252062] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252074] ata1.00: cmd 60/00:38:00:f6:b0/02:00:08:00:00/40 tag 7 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252078] ata1.00: status: { DRDY }
    [ 4101.252083] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252095] ata1.00: cmd 60/00:40:00:d2:b0/02:00:08:00:00/40 tag 8 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252099] ata1.00: status: { DRDY }
    [ 4101.252105] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252116] ata1.00: cmd 60/00:48:00:ae:b0/02:00:08:00:00/40 tag 9 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252120] ata1.00: status: { DRDY }
    [ 4101.252126] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252138] ata1.00: cmd 60/00:50:00:b8:ae/02:00:08:00:00/40 tag 10 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252142] ata1.00: status: { DRDY }
    [ 4101.252147] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252159] ata1.00: cmd 60/00:58:00:94:ae/02:00:08:00:00/40 tag 11 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252163] ata1.00: status: { DRDY }
    [ 4101.252169] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252181] ata1.00: cmd 60/00:60:00:b8:bd/02:00:08:00:00/40 tag 12 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252185] ata1.00: status: { DRDY }
    [ 4101.252190] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252202] ata1.00: cmd 60/00:68:00:94:bd/02:00:08:00:00/40 tag 13 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252206] ata1.00: status: { DRDY }
    [ 4101.252211] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252223] ata1.00: cmd 60/00:70:00:70:bd/02:00:08:00:00/40 tag 14 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252228] ata1.00: status: { DRDY }
    [ 4101.252232] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252244] ata1.00: cmd 60/00:78:00:4c:bd/02:00:08:00:00/40 tag 15 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252249] ata1.00: status: { DRDY }
    [ 4101.252254] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252266] ata1.00: cmd 60/00:80:00:56:bb/02:00:08:00:00/40 tag 16 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252270] ata1.00: status: { DRDY }
    [ 4101.252276] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252288] ata1.00: cmd 60/00:88:00:32:bb/02:00:08:00:00/40 tag 17 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252292] ata1.00: status: { DRDY }
    [ 4101.252297] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252309] ata1.00: cmd 60/00:90:00:0e:bb/02:00:08:00:00/40 tag 18 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252314] ata1.00: status: { DRDY }
    [ 4101.252319] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252330] ata1.00: cmd 60/00:98:00:ea:ba/02:00:08:00:00/40 tag 19 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252335] ata1.00: status: { DRDY }
    [ 4101.252340] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252352] ata1.00: cmd 60/00:a0:00:c6:ba/02:00:08:00:00/40 tag 20 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252356] ata1.00: status: { DRDY }
    [ 4101.252361] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252373] ata1.00: cmd 60/00:a8:00:d0:b8/02:00:08:00:00/40 tag 21 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252377] ata1.00: status: { DRDY }
    [ 4101.252382] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252394] ata1.00: cmd 60/00:b0:00:ac:b8/02:00:08:00:00/40 tag 22 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252399] ata1.00: status: { DRDY }
    [ 4101.252404] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252416] ata1.00: cmd 60/00:b8:00:88:b8/02:00:08:00:00/40 tag 23 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252420] ata1.00: status: { DRDY }
    [ 4101.252426] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252437] ata1.00: cmd 60/00:c0:00:64:b8/02:00:08:00:00/40 tag 24 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252442] ata1.00: status: { DRDY }
    [ 4101.252447] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252459] ata1.00: cmd 60/00:c8:00:40:b8/02:00:08:00:00/40 tag 25 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252463] ata1.00: status: { DRDY }
    [ 4101.252468] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252480] ata1.00: cmd 60/00:d0:00:4a:b6/02:00:08:00:00/40 tag 26 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252485] ata1.00: status: { DRDY }
    [ 4101.252490] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252502] ata1.00: cmd 60/00:d8:00:26:b6/02:00:08:00:00/40 tag 27 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252506] ata1.00: status: { DRDY }
    [ 4101.252511] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252523] ata1.00: cmd 60/00:e0:00:de:b5/02:00:08:00:00/40 tag 28 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252527] ata1.00: status: { DRDY }
    [ 4101.252532] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252545] ata1.00: cmd 60/00:e8:00:ba:b5/02:00:08:00:00/40 tag 29 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252549] ata1.00: status: { DRDY }
    [ 4101.252554] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252566] ata1.00: cmd 60/00:f0:00:c4:b3/02:00:08:00:00/40 tag 30 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252570] ata1.00: status: { DRDY }
    [ 4101.252575] ata1.00: failed command: READ FPDMA QUEUED
    [ 4101.252587] ata1.00: cmd 60/00:f8:00:70:ae/02:00:08:00:00/40 tag 31 ncq dma 262144 in
                            res 40/00:c8:00:40:b8/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
    [ 4101.252592] ata1.00: status: { DRDY }
    [ 4101.252603] ata1: hard resetting link
    [ 4101.727761] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
    [ 4101.749701] ata1.00: configured for UDMA/133
    [ 4101.752101] ata1: EH complete

     

     

     

    Spoiler
    
    smartctl  -x /dev/sda
    smartctl 6.6 2017-11-05 r4594 [aarch64-linux-5.9.14-rockchip64] (local build)
    Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Model Family:     SandForce Driven SSDs
    Device Model:     SanDisk SDSSDX240GG25
    Serial Number:    123273400127
    LU WWN Device Id: 5 001b44 7bf3fcb3f
    Firmware Version: R201
    User Capacity:    240,057,409,536 bytes [240 GB]
    Sector Size:      512 bytes logical/physical
    Rotation Rate:    Solid State Device
    Device is:        In smartctl database [for details use: -P show]
    ATA Version is:   ATA8-ACS, ACS-2 T13/2015-D revision 3
    SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
    Local Time is:    Wed Jan 27 13:24:04 2021 +03
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled
    AAM feature is:   Unavailable
    APM level is:     254 (maximum performance)
    Rd look-ahead is: Disabled
    Write cache is:   Enabled
    DSN feature is:   Unavailable
    ATA Security is:  Disabled, NOT FROZEN [SEC1]
    Wt Cache Reorder: Unavailable
    
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    
    General SMART Values:
    Offline data collection status:  (0x02) Offline data collection activity
                                            was completed without error.
                                            Auto Offline Data Collection: Disabled.
    Self-test execution status:      (   0) The previous self-test routine completed
                                            without error or no self-test has ever
                                            been run.
    Total time to complete Offline
    data collection:                (    0) seconds.
    Offline data collection
    capabilities:                    (0x7b) SMART execute Offline immediate.
                                            Auto Offline data collection on/off support.
                                            Suspend Offline collection upon new
                                            command.
                                            Offline surface scan supported.
                                            Self-test supported.
                                            Conveyance Self-test supported.
                                            Selective Self-test supported.
    SMART capabilities:            (0x0003) Saves SMART data before entering
                                            power-saving mode.
                                            Supports SMART auto save timer.
    Error logging capability:        (0x01) Error logging supported.
                                            General Purpose Logging supported.
    Short self-test routine
    recommended polling time:        (   1) minutes.
    Extended self-test routine
    recommended polling time:        (  48) minutes.
    Conveyance self-test routine
    recommended polling time:        (   2) minutes.
    SCT capabilities:              (0x0021) SCT Status supported.
                                            SCT Data Table supported.
    
    SMART Attributes Data Structure revision number: 10
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
      1 Raw_Read_Error_Rate     POSR--   119   119   050    -    0/225624030
      5 Retired_Block_Count     PO--CK   100   100   003    -    7
      9 Power_On_Hours_and_Msec -O--CK   000   000   000    -    30893h+27m+05.090s
     12 Power_Cycle_Count       -O--CK   099   099   000    -    1951
    171 Program_Fail_Count      -O--CK   000   000   000    -    0
    172 Erase_Fail_Count        -O--CK   000   000   000    -    0
    174 Unexpect_Power_Loss_Ct  ----CK   000   000   000    -    999
    177 Wear_Range_Delta        ------   000   000   000    -    3
    181 Program_Fail_Count      -O--CK   000   000   000    -    0
    182 Erase_Fail_Count        -O--CK   000   000   000    -    0
    187 Reported_Uncorrect      -O--CK   100   100   000    -    0
    194 Temperature_Celsius     -O---K   028   055   000    -    28 (Min/Max 10/55)
    195 ECC_Uncorr_Error_Count  --SRC-   120   120   000    -    0/225624030
    196 Reallocated_Event_Count PO--CK   100   100   003    -    7
    201 Unc_Soft_Read_Err_Rate  --SRC-   120   120   000    -    0/225624030
    204 Soft_ECC_Correct_Rate   --SRC-   120   120   000    -    0/225624030
    230 Life_Curve_Status       PO--C-   100   100   000    -    100
    231 SSD_Life_Left           PO--C-   100   100   010    -    0
    233 SandForce_Internal      ------   000   000   000    -    10541
    234 SandForce_Internal      -O--CK   000   000   000    -    13196
    241 Lifetime_Writes_GiB     -O--CK   000   000   000    -    13196
    242 Lifetime_Reads_GiB      -O--CK   000   000   000    -    13491
                                ||||||_ K auto-keep
                                |||||__ C event count
                                ||||___ R error rate
                                |||____ S speed/performance
                                ||_____ O updated online
                                |______ P prefailure warning
    
    General Purpose Log Directory Version 1
    SMART           Log Directory Version 1 [multi-sector log support]
    Address    Access  R/W   Size  Description
    0x00       GPL,SL  R/O      1  Log Directory
    0x07       GPL     R/O      1  Extended self-test log
    0x09           SL  R/W      1  Selective self-test log
    0x10       GPL     R/O      1  NCQ Command Error log
    0x11       GPL,SL  R/O      1  SATA Phy Event Counters log
    0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
    0xb7       GPL,SL  VS      16  Device vendor specific log
    0xe0       GPL,SL  R/W      1  SCT Command/Status
    0xe1       GPL,SL  R/W      1  SCT Data Transfer
    
    SMART Extended Comprehensive Error Log (GP Log 0x03) not supported
    
    SMART Error Log not supported
    
    SMART Extended Self-test Log Version: 0 (1 sectors)
    No self-tests have been logged.  [To run self-tests, use: smartctl -t]
    
    SMART Selective self-test log data structure revision number 0
    Note: revision number not 1 implies that no selective self-test has ever been run
     SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
        1        0        0  Not_testing
        2        0        0  Not_testing
        3        0        0  Not_testing
        4        0        0  Not_testing
        5        0        0  Not_testing
    Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.
    
    SCT Status Version:                  3
    SCT Version (vendor specific):       0 (0x0000)
    SCT Support Level:                   1
    Device State:                        Active (0)
    Current Temperature:                    28 Celsius
    Power Cycle Min/Max Temperature:     27/31 Celsius
    Lifetime    Min/Max Temperature:     10/55 Celsius
    Under/Over Temperature Limit Count:   0/0
    
    SCT Temperature History Version:     10 (Unknown, should be 2)
    Temperature Sampling Period:         1 minute
    Temperature Logging Interval:        10 minutes
    Min/Max recommended Temperature:      0/120 Celsius
    Min/Max Temperature Limit:            0/ 0 Celsius
    Temperature History Size (Index):    56576 (7)
    Invalid Temperature History Size or Index
    
    SCT Error Recovery Control command not supported
    
    Device Statistics (GP/SMART Log 0x04) not supported
    
    Pending Defects log (GP Log 0x0c) not supported
    
    SATA Phy Event Counters (GP Log 0x11)
    ID      Size     Value  Description
    0x0001  2            0  Command failed due to ICRC error
    0x0003  2            0  R_ERR response for device-to-host data FIS
    0x0004  2            0  R_ERR response for host-to-device data FIS
    0x0006  2            0  R_ERR response for device-to-host non-data FIS
    0x0007  2            0  R_ERR response for host-to-device non-data FIS
    0x0008  2            0  Device-to-host non-data FIS retries
    0x0009  2            4  Transition from drive PhyRdy to drive PhyNRdy
    0x000a  2            4  Device-to-host register FISes sent due to a COMRESET
    0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
    0x0010  2            0  R_ERR response for host-to-device data FIS, non-CRC
    0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
    0x0013  2            0  R_ERR response for host-to-device non-data FIS, non-CRC
    0x0002  2            0  R_ERR response for data FIS
    0x0005  2            0  R_ERR response for non-data FIS
    0x000b  2            0  CRC errors within host-to-device FIS
    0x000d  2            0  Non-CRC errors within host-to-device FIS

     

     

  12. Just had another one right now. Previous one was 3 days ago.

    So I plugged the serial cable but the device was completely gone. No login, or any output. 

    Also there are no logs or anything about previous boot once rebooted either.

    journalctl --list-boots lists only the current boot, and dmesg -E is empty.

     

    Anyhow, trying the CPU speed thing now. I hope it works, cleaning the entire SSD cache that gets dirty at every crash is worrying me.

  13. - I don't see the point of baking a m.2 or any future nvme PCI-e port on the board honestly. Instead I would love to see a PCI-e slot or two exposed. That way end user can pick what expansion is needed. It can be an nvme adapter, can be 10gbe network, or SATA or USB or even a SAS controller for additional external enclosures. It will probably be more expensive for the end user but will open up a larger market for the device. We already have 2 lanes/10gbps worth of PCIe not utilized as is (correct me on this).

    - Absolutely more/user upgradeable and ECC ram support.

    - Better trays, you guys they are a pain right now lol. ( Loving the black and purple though )

    - Some sort of active/quiet cooling on the SoC separate from the disks, as any disk activity over the network or even a simple CPU bound task ramps up all the fans immediately.

    - Most important of all, please do not stray too far from the current architecture both hardware and firmware wise. Aside from everything else, we would really love to see a rock solid launch next time around. Changing a lot of things might cause most of your hard work to-date to mean nothing.

  14. @allen--smithee I can see how these options could be useful to others, but in my case, all I needed was the original config. 4 HDD raid5 + 1 SSD writeback cache :) I would have been forced to get a QNAP if I needed more.

     

    But I'll play :) 

    I would pick none, here is my counter offer-

    A- Replace the m.2 slot with a PMP enabled eSATA port on the back for a low end disk tower to be attached for bulk/slow storage.  There are generic 8 slot (no PMP over 5, those are raid towers but still eSATA connected) towers available right now. Added bonus is more space on PCB for something.

    OR

    B- If possible expose a PCI-e slot or two for adding in a SFP+ and a mini SAS card for a fast disk tower and 10gbe network to be connected.  This probably would require more ram, more powerful cpu, and more watts, and all that jazz. Not sure if this is possible in ARM arch available to us mere mortals yet. Obviously Amazon and others have their datacenter ARM cpus that has all these and then some. (I just realized after writing this that I explained a xeon QNAP unit... but they sell, sooo maybe good?)

     

    They could design and sell these disk towers and the add in cards as well. Currently available SAS and eSata towers are quite drab and blend to look at. Attractive Helios64 design aesthetic would make them look very pretty.

     

    Also, this way the original helios64 box that is suited to home use doesn't get bulkier, its design is mostly unchanged so the production troubles that are solved won't be scratched but built upon. I remember from the blog comments of the Kobol guys that the enclosure factory was problematic and caused some delays at the end. Multiple PCBs are not produced for each use case and the user is free to add in options. 

     

    In my opinion, A) eSATA port on the back and a Kobol designed disk enclosure would be perfect for most. And a good incremental way forward.

    3 disk ZFS or RAID5 + 1 SSD for L2ARC/SLOG or dmcache for fast daily work space

    and

    5/8 disks for daily backups / archives / any slow storage needs.

     

    @gprovost Thank you for renaming the thread, I was feeling bad for posting at and keeping it active.

     

  15.  

    12 minutes ago, Werner said:

    Feel free to take a look into. Curious about that too.

    Couldn't find anything in the Helios64 patch or other RK3399 that relates to port multiplication. 

    Not sure I'm looking at this correctly but when I search the repo for 'pmp' I'm seeing a lot of results in config/kernel folder with 

    CONFIG_SATA_PMP=y

    Extrapolating those search results, I think I can assume that port multiplication is off by default in Armbian, and needs to be enabled for that specific SoC or device. No?

     

    But later I found this on https://wiki.kobol.io/helios64/sata/

    Quote

    SATA Controller Features¶

    5 SATA ports,

    Command-based and FIS-based for Port Multiplier,

    Compliance with SATA Specification Revision 3.2,

    AHCI mode and IDE programming interface,

    Native Command Queue (NCQ),

    SATA link power saving mode (partial and slumber),

    SATA plug-in detection capable,

    Drive power control and staggered spin-up,

    SATA Partial / Slumber power management state.

     

    So apparently port multipliers should be supported on Helios64, or will be supported? Honestly I don't think they would be very useful since the case has to be modified and stuff. I first mentioned it as a joke to begin with. I don't think anyone would go for this as the case itself is a cool piece of design and butchering it would be a shame.

  16. https://man7.org/linux/man-pages/man7/lvmcache.7.html

    https://linuxhint.com/configuring-zfs-cache/

    Here you go... use 4 HDDs and an SSD cache.

    Or sell your unit, quite a lot of people wanted to buy one and couldn't in time.

    OR, Frankenstein your unit and add in port multipliers to original SATA ports. Can add up to 20 HDDs to the 4 original SATA ports, and up to 5 SSDs to the remaining 1 original SATA port. The hardware controller specs say that it supports port multipliers, not sure about Armbian kernel, might have to modify.

     

    Btw, you can take a look at the new version of Odroid H2+ with port multipliers (up to 10 SATA disks, + PCI-e Nvme) if you are into more tinkering. Also you get two 2.5G network ports instead of one. Hardkernel team has a blog post about its setup and benchmarks. https://wiki.odroid.com/odroid-h2/application_note/10_sata_drives

    I am planning to expand my network infra with the H2+ soon. You can even plug a $45 4 port 2.5g switch into the Nvme slot now. I'm going crazy about this unit. If only it didn't have a shintel (r/AyyMD) cpu in.

     

    Anyhow-

    Just doing a bit of research shows that this was not a 'decision' made by Kobol, not exactly. There are just two reliable PCI-e to SATA controllers I could find that support multiple SATA ports(+4), with the limitation of RK3399 which has 4 PCI-e lanes. It would be a different story if RK had 8 lanes, but that is another can of worms that includes cpu arch, form factor, etc. Not gonna open that can while I'm barely qualified to open this one.

     

    What we have here in Helios64 is JMB585, and then the other option was Marvel 88SE9215. Marvel only supports 4 ports, while JMB supports 5. There are no controllers that work reliably with 4 PCI-e 2.1 lanes and have more than 5 ports that I could find.

    There is the quite new Asmedia ASM1166 which actually supports 6 ports, but that was probably not available during design of Helios64 as it is quite new. Not only that, there is a weird thread about its log output on Unraid forums.

     

    At the end, this '5 ports only' was not exactly a decision for Kobol. But a rather uninformed decision made by you. You don't see people here complaining about this. You are the second one who even mentioned it, and the only one who is complaining so far with CAPS no less. Which means that the specs were clear to pretty much everyone that this was the case.

     

    My suggestion is to replace one of the drives with a Samsung 860 pro, make it a SLOG/L2ARC, or in my case a LVM cache (write back mode, make sure you have a proper UPS, or the battery in the unit is connected properly) and call it a day. SATA port is faster than the 2.5G ethernet or the USB DAS mode anyhow so your cache SSD will mostly perform ok.

  17. @gprovost Hi, my Helios64 is connected to a Zyxel XGS1010-12 switch on its 2.5gbps ports via a CAT5E 26AWG metal jacket cable. On the pc side I have the sabrent 2.5g usb network adapter, again connected to the other 2.5gbps port. I think sabrent usb adapter has the same controller as the Helios64. 

     

    Spoiler
    
    PS C:> Get-NetAdapter
    
    Name                      InterfaceDescription                    ifIndex Status       MacAddress             LinkSpeed
    ----                      --------------------                    ------- ------       ----------             ---------
    Ethernet 2                Realtek USB 2.5GbE Family Controller         16 Up           00-24-27-88-0B-5A       2.5 Gbps
    ZeroTier One [93afae59... ZeroTier Virtual Port                        15 Up           2A-AE-7A-A6-B8-55       100 Mbps
    ZeroTier One [af78bf94... ZeroTier Virtual Port #2                     53 Up           62-D9-06-F3-75-44       100 Mbps
    VirtualBox Host-Only N... VirtualBox Host-Only Ethernet Adapter        12 Up           0A-00-27-00-00-0C         1 Gbps
    Wi-Fi                     Intel(R) Wi-Fi 6 AX200 160MHz                11 Disconnected 50-E0-85-F5-02-3A          0 bps
    Ethernet                  Intel(R) I211 Gigabit Network Connec...       4 Disconnected B4-2E-99-8E-1D-2E          0 bps
    
    
    PS C:> Get-NetAdapterAdvancedProperty "Ethernet 2"
    
    Name                      DisplayName                    DisplayValue                   RegistryKeyword RegistryValue
    ----                      -----------                    ------------                   --------------- -------------
    Ethernet 2                Energy-Efficient Ethernet      Disabled                       *EEE            {0}
    Ethernet 2                Flow Control                   Rx & Tx Enabled                *FlowControl    {3}
    Ethernet 2                IPv4 Checksum Offload          Rx & Tx Enabled                *IPChecksumO... {3}
    Ethernet 2                Jumbo Frame                    Disabled                       *JumboPacket    {1514}
    Ethernet 2                Large Send Offload v2 (IPv4)   Enabled                        *LsoV2IPv4      {1}
    Ethernet 2                Large Send Offload v2 (IPv6)   Enabled                        *LsoV2IPv6      {1}
    Ethernet 2                ARP Offload                    Enabled                        *PMARPOffload   {1}
    Ethernet 2                NS Offload                     Enabled                        *PMNSOffload    {1}
    Ethernet 2                Priority & VLAN                Priority & VLAN Enabled        *PriorityVLA... {3}
    Ethernet 2                Selective suspend              Enabled                        *SelectiveSu... {1}
    Ethernet 2                Speed & Duplex                 Auto Negotiation               *SpeedDuplex    {0}
    Ethernet 2                SS idle timeout                5                              *SSIdleTimeout  {5}
    Ethernet 2                SS idle timeout(Screen off)    3                              *SSIdleTimeo... {3}
    Ethernet 2                TCP Checksum Offload (IPv4)    Rx & Tx Enabled                *TCPChecksum... {3}
    Ethernet 2                TCP Checksum Offload (IPv6)    Rx & Tx Enabled                *TCPChecksum... {3}
    Ethernet 2                UDP Checksum Offload (IPv4)    Rx & Tx Enabled                *UDPChecksum... {3}
    Ethernet 2                UDP Checksum Offload (IPv6)    Rx & Tx Enabled                *UDPChecksum... {3}
    Ethernet 2                Wake on Magic Packet           Enabled                        *WakeOnMagic... {1}
    Ethernet 2                Wake on pattern match          Enabled                        *WakeOnPattern  {1}
    Ethernet 2                Advanced EEE                   Disabled                       AdvancedEEE     {0}
    Ethernet 2                Green Ethernet                 Enabled                        EnableGreenE... {1}
    Ethernet 2                Idle Power Saving              Enabled                        EnableU2P3      {1}
    Ethernet 2                Gigabit Lite                   Enabled                        GigaLite        {1}
    Ethernet 2                Network Address                                               NetworkAddress  {--}
    Ethernet 2                VLAN ID                        0                              VlanID          {0}
    Ethernet 2                Wake on link change            Enabled                        WakeOnLinkCh... {1}
    Ethernet 2                WOL & Shutdown Link Speed      10 Mbps First                  WolShutdownL... {0}
    

     

     

  18. Yeah, I'm back to the 1gbps port too... Pushed over 3 TB of data on to the device from multiple sources simultaneously the other day and haven't had a single error.

    With the previous stable builds it used to throw errors quite often during transfers, so when I didn't see any errors with the new one I figured the problem must have been fixed.

    Today I'm loosing eth1 completely even without exerting much of a load. Maybe its because I'm pulling data from the device today instead of pushing. They are tx errors after all and not rx.

     

     

×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines