clostro

  • Posts

    30
  • Joined

Reputation Activity

  1. Like
    clostro reacted to Gareth Halfacree in Upgrading to Bullseye (troubleshooting Armbian 21.08.1)   
    Oh, I've tried reporting problems. Igor told me (and many, many others) that if I wasn't paying €50 a month to Armbian he wasn't interested, so I stopped.
  2. Like
    clostro reacted to Gareth Halfacree in Upgrading to Bullseye (troubleshooting Armbian 21.08.1)   
    If anyone has installed the update but *not* rebooted, it's a quick (temporary) fix:
     
    sudo apt install linux-dtb-current-rockchip64=21.05.4 linux-headers-current-rockchip64=21.05.4 linux-image-current-rockchip64=21.05.4  
  3. Like
    clostro reacted to wurmfood in Summary of troubleshooting items   
    There are a number of modifications that have been suggested that people implement to address certain issues.
     
    The ones I can find are:
    - In /boot/armbianEnv.txt:
    extraargs=libata.force=3.0 - If doing debugging, also add:
    verbosity=7 console=serial extraargs=earlyprintk ignore_loglevel  
    - In /boot/boot.cmd
    regulator dev vdd_log regulator value 930000 regulator dev vdd_center regulator value 950000 and then run:
    mkimage -C none -A arm -T script -d /boot/boot.cmd /boot/boot.scr  
    - In /etc/default/cpufrequtils:
    ENABLE=true MIN_SPEED=408000 MAX_SPEED=1800000 GOVERNOR=ondemand (or 1200000 instead of 1800000)
     
    - And if using ZFS:
    for disk in /sys/block/sd[a-e]/queue/scheduler; do echo none > $disk; done  
     
    I've gathered these from a variety of threads. Am I missing any here?
  4. Like
    clostro reacted to Gareth Halfacree in Feature / Changes requests for future Helios64 board or enclosure revisions   
    The key upgrade for me would be moving away from Armbian to a professional distribution like mainstream Ubuntu or Debian.
  5. Like
    clostro reacted to ShadowDance in SATA issue, drive resets: ataX.00: failed command: READ FPDMA QUEUED   
    @usefulnoise I'd start by trying all different suggestions in this thread, e.g. limiting speed, disabling ncq, if you're not using raw disks (i.e. partitions or dm-crypt), make sure you've disabled io schedulers on the disks, etc.
     
    Example: libata.force=3.0G,noncq,noncqtrim
     
    Disabling ncqtrim is probably unnecessary, but doesn't give any benefit with spinning disks anyway.
     
    If none of this helps, and you're sure the disks aren't actually faulty, I'd recommend trying the SATA controller firmware update (it didn't help me) or possibly experimenting with removing noise. Hook the PSU to a grounded wall socket, use 3rd party SATA cables, or try rerouting them.
     
    Possibly, if you're desperate, try removing the metal clips from the SATA cables (the clip that hooks into the motherboard socket), it shouldn't be a problem, but could perhaps function as an antenna for noise.
  6. Like
    clostro reacted to ErikC in SATA issue, drive resets: ataX.00: failed command: READ FPDMA QUEUED   
    This seems to be the best technical discussion of the issue that I've found so far, so I'm jumping in even though I'm using a different platform. I'm having the same issue with these drives, but I have a slightly larger sample set to play with an I've found a pattern in my case. I'm using a Synology NAS with several Western Digital Red Pro drives. I'm seeing that - with flawless consistency - only certain a model number exhibits this problem. I'm also seeing that this model number is explicitly not listed in Synology's hardware compatibility list, even though other Western Digital Red Pro drives of the same capacity are. So I'm strongly suspecting that there is either a hardware or firmware issue with certain models. In my case, I have:
     
    WD6002FFWX-68TZ4N0 - NOT Reliable
    WD6003FFBX-68MU3N0 - Reliable
    WD6003FRYZ-01F0DB0 - Reliable
     
    I'm also noticing that the issue occurs on very hot days. The room temperature only varies by a few degrees (yay air conditioning), but again it's a very consistent pattern. We don't have any wall outlet under-voltage issues; our utility power is extremely clean here. We also monitor it with our UPS and ATS systems.
     
    Hope this helps.
  7. Like
    clostro reacted to meymarce in SATA issue, drive resets: ataX.00: failed command: READ FPDMA QUEUED   
    @usefulnoise
    @ShadowDance
    Maybe this is interesting for you: As I had the same issue but did not wanna mess around too much and some where successful with 3rd party sata cables, I was looking for a way to get this done w/o loosing the HDD slides into the SATA slot option.
    I found this (hope this does not count as advertisement):
    https://www.aliexpress.com/item/1005002595620272.html
    which seems to be the general harness of a Lenovo X3100 M5 Server. I was also looking to get some screw-able SATA mounts, but could not find any that would not require me doing the cable works and soldering too.
    However the Lenovo cables looked promising and as it turns out, you just use this as a drop in replacement. The screws fit. However you have to mount them from the front side instead the back, but the case has enough space to do that.

     
    You do however need to push very hard, to get the screws to grip, but even the threads match.
    I did a bit of grinding on the metal frame though to reduce the tension on the plastic of the connector cause it is a bit thicker. Not sure that is required though (I think not).
    For the fifth slot some grinding will be inevitable though. Since that slot is not populated for me right now (and has to wait a bit on current pricing unless I find a bargain) and one cable set has 4 cables, this was no issue. Another two cable sets are on the way though (have another Helios64).
    I do not know how you cut the connectors of the original harness to get power only, but I used a separation disk on a Dremel like tool. Worked like a charm.
    The cables are all similar length so you need to do try a bit until you find a position that works and does not twist the cable too much but gets it to the ports on the board.
    Here what I ended up with (sorry for the sub-optimal quality but only saw this now on the PC and I do not want to disassemble the case again)

     
    Hope that is helpful for anyone. I am super happy with this solution
    I have not had any issues with my ZFS setup anymore since.
     
    Cheers
     
  8. Like
    clostro reacted to Heisath in eMMc drive filled to 90% overnight and no idea why.   
    Use a tool like ncdu to figure out which folder(s) are getting so big. Then look into them.
  9. Like
    clostro reacted to halfa in Armbian 21.05.2 Focal with Linux 5.10.35-rockchip64: fancontrol die in error, fans not spinning   
    Posting here following what was recommended on twitter.
    After updating my helios64 earlier this week and rebooting to get the new kernel, I realized it was suspiciously silent.
    A quick check to sensor temps readings and physical check made me realize the fan were not spinning.
     
    After a quick read on the wiki, I checked fancontrol which was indeed failing:
    root@helios64:~ # systemctl status fancontrol.service ● fancontrol.service - fan speed regulator Loaded: loaded (/lib/systemd/system/fancontrol.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/fancontrol.service.d └─pid.conf Active: failed (Result: exit-code) since Fri 2021-05-28 00:08:13 CEST; 1min 42s ago Docs: man:fancontrol(8) man:pwmconfig(8) Process: 2495 ExecStartPre=/usr/sbin/fancontrol --check (code=exited, status=0/SUCCESS) Process: 2876 ExecStart=/usr/sbin/fancontrol (code=exited, status=1/FAILURE) Main PID: 2876 (code=exited, status=1/FAILURE) May 28 00:08:13 helios64 fancontrol[2876]: MINPWM=0 May 28 00:08:13 helios64 fancontrol[2876]: MAXPWM=255 May 28 00:08:13 helios64 fancontrol[2876]: AVERAGE=1 May 28 00:08:13 helios64 fancontrol[2876]: Error: file /dev/thermal-cpu/temp1_input doesn't exist May 28 00:08:13 helios64 fancontrol[2876]: Error: file /dev/thermal-cpu/temp1_input doesn't exist May 28 00:08:13 helios64 fancontrol[2876]: At least one referenced file is missing. Either some required kernel May 28 00:08:13 helios64 fancontrol[2876]: modules haven't been loaded, or your configuration file is outdated. May 28 00:08:13 helios64 fancontrol[2876]: In the latter case, you should run pwmconfig again. May 28 00:08:13 helios64 systemd[1]: fancontrol.service: Main process exited, code=exited, status=1/FAILURE May 28 00:08:13 helios64 systemd[1]: fancontrol.service: Failed with result 'exit-code'.  
    Basically fancontrol expect a device in /dev to read the sensors value from, and that device seems to be missing. After a bit of poking around and learning about udev, I managed to manually solve the issue by recreating the device symlink manually:
    /usr/bin/mkdir /dev/thermal-cpu/ ln -s /sys/devices/virtual/thermal/thermal_zone0/temp /dev/thermal-cpu/temp1_input systemctl restart fancontrol.service systemctl status fancontrol.service Now digging more this issue happen because udev is not creating the symlink like it should for some reason. After reading the rule in /etc/udev/rules.d/90-helios64-hwmon-legacy.rules and a bit of udev documentation, I managed to find how to test it:
    root@helios64:~ # udevadm test /sys/devices/virtual/thermal/thermal_zone0 [...] Reading rules file: /etc/udev/rules.d/90-helios64-hwmon-legacy.rules Reading rules file: /etc/udev/rules.d/90-helios64-ups.rules [...] DEVPATH=/devices/virtual/thermal/thermal_zone0 ACTION=add SUBSYSTEM=thermal IS_HELIOS64_HWMON=1 HWMON_PATH=/sys/devices/virtual/thermal/thermal_zone0 USEC_INITIALIZED=7544717 run: '/bin/ln -sf /sys/devices/virtual/thermal/thermal_zone0 ' <-- something is wrong here, there is no target Unload module index Unloaded link configuration context. After spending a bit more time reading the udev rule, I realized that the second argument was empty because we don't match the ATTR{type}=="soc-thermal" condition. We can look up the types like this:
    root@helios64:~ # find /sys/ -name type | grep thermal /sys/devices/virtual/thermal/cooling_device1/type /sys/devices/virtual/thermal/thermal_zone0/type /sys/devices/virtual/thermal/cooling_device4/type /sys/devices/virtual/thermal/cooling_device2/type /sys/devices/virtual/thermal/thermal_zone1/type /sys/devices/virtual/thermal/cooling_device0/type /sys/devices/virtual/thermal/cooling_device3/type /sys/firmware/devicetree/base/thermal-zones/gpu/trips/gpu_alert0/type /sys/firmware/devicetree/base/thermal-zones/gpu/trips/gpu_crit/type /sys/firmware/devicetree/base/thermal-zones/cpu/trips/cpu_crit/type /sys/firmware/devicetree/base/thermal-zones/cpu/trips/cpu_alert0/type /sys/firmware/devicetree/base/thermal-zones/cpu/trips/cpu_alert1/type root@helios64:~ # cat /sys/devices/virtual/thermal/thermal_zone0/type cpu <-- we were expecting soc-thermal! and after rewriting the line with the new type, udev is happy again
    # Edit in /etc/udev/rules.d/90-helios64-hwmon-legacy.rules and add the following line after the original one ATTR{type}=="cpu", ENV{HWMON_PATH}="/sys%p/temp", ENV{HELIOS64_SYMLINK}="/dev/thermal-cpu/temp1_input", RUN+="/usr/bin/mkdir /dev/thermal-cpu/" root@helios64:~ # udevadm control --reload root@helios64:~ # udevadm test /sys/devices/virtual/thermal/thermal_zone0 [...] DEVPATH=/devices/virtual/thermal/thermal_zone0 ACTION=add SUBSYSTEM=thermal IS_HELIOS64_HWMON=1 HWMON_PATH=/sys/devices/virtual/thermal/thermal_zone0/temp HELIOS64_SYMLINK=/dev/thermal-cpu/temp1_input USEC_INITIALIZED=7544717 run: '/usr/bin/mkdir /dev/thermal-cpu/' run: '/bin/ln -sf /sys/devices/virtual/thermal/thermal_zone0/temp /dev/thermal-cpu/temp1_input' Unload module index Unloaded link configuration context. Apparently for some reason the device-tree changed upstream and the thermal type changed from soc-thermal to cpu?
  10. Like
    clostro reacted to akschu in Stability issues. I get crashes compiling ZFS when clock is > 408mhz   
    I'm working on a new distro which is an ARM port of slackware64-15.0 beta.  The cooked image is here: http://mirrors.aptalaska.net/slackware/slarm64/images/helios64/
     
    This image is built using the build scripts at https://gitlab.com/sndwvs/images_build_kit/-/tree/arm/ 
     
    The high level view is that it's downloading the kernel from git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git (linux-5.12.y) then applying the patches found at https://gitlab.com/sndwvs/images_build_kit/-/tree/arm/patch/kernel/rk3399-next then builds the kernel with config https://gitlab.com/sndwvs/images_build_kit/-/blob/arm/config/kernel/linux-rk3399-next.config 
     
    This is very similar to armbian, and the system seems to work just fine at first, but I cannot build ZFS without segfaults or other crashes.  I figured it was overheating or something so I added
    echo 255 > /sys/class/hwmon/hwmon3/pwm1
    echo 255 > /sys/class/hwmon/hwmon4/pwm1
    which turns the fans on full.  The system gets very loud and nowhere near the 100C critical threshold, but when one of the fast cores (4-5) runs at 100% for more than a minute the compiler crashes (no reboots).
     
    I started lowering the cpu frequency lower and lower until I could get through a build and the fastest I can set the CPU and be stable is 408mhz:
    cpufreq-set --cpu 0 --freq 408000
    cpufreq-set --cpu 4 --freq 408000
     
    I also tried changing the voltage regulator per another post I found:
    regulator dev vdd_log
    regulator value 930000
    regulator dev vdd_center
    regulator value 950000
     
    But that didn't help. 
     
    Any other ideas on how to make this box stable at more than 408mhz?  It would be great to have slackware on it and be able to do things with it.
     
    Thanks,
    Matt
  11. Like
    clostro reacted to gprovost in Kobol Team is taking a short Break !   
    It’s been 3 months since we posted on our blog. While we have been pretty active on Armbian/Kobol forum for support and still working at improving the software support and stability, we have been developing in parallel the new iteration of Helios64 around the latest Rockchip SoC RK3568.
     
    However things haven’t been progressing as fast as we would have wished. Looking back, 2020 has been a very challenging year to deliver a new product and it took quite a toll on the small team we are. Our energy level is a bit low and we still haven’t really recovered. Now with electronic part prices surge and crazy lead time, it’s even harder to have business visibility in an already challenging market.
     
    In light of the above, we decided to go on a full break for the next 2 months, to recharge our battery away from Kobol and come back with a refocused strategy and pumped up energy.
     
    Until we are back, we hope you will understand that communication on the different channels (blog, wiki, forum, support email) will be kept to a minimum for the next 2 months.
     
    Thanks again all for your support.
  12. Like
    clostro reacted to nquinn in New cpu anytime soon?   
    Very interested in the Helios64, but can't help but to notice the rockchip cpu in it is 5+ years old at this point.
     
    Any plans to upgrade it to a more modern cpu?
     
    At this price range I'd probably lean towards an Odroid H2+ with a celeron J1114 chip which has like 2-3x the multicore performance and a similar TDP
  13. Like
    clostro reacted to SIGSEGV in Feature / Changes requests for future Helios64 board or enclosure revisions   
    My comment might be late to the party - if there was a possibility to add an optional display and a few user-configurable buttons to the Front Panel, that would be great.
    I know it would mess a bit with the airflow, but it could be used for system monitoring and few other specific use cases.
  14. Like
    clostro reacted to aprayoga in UPS service and timer   
    @SIGSEGV @clostro the service is triggered by udev rules.
     
    @wurmfood, I didn't realize the timer fill the the log every 20s. Initial idea was one time timer, poweroff after 10m of power loss event. Then it was improved to add polling of the battery level and poweroff when threshold reached. Your script look good, we are considering to adapt it to official release. Thank you
     
     
  15. Like
    clostro reacted to wurmfood in UPS service and timer   
    Sigh. Except that doesn't solve the problem. Now it's just cron filling up the log.
     
    New solution, using the sleep option. Modified helio64-ups.service:
    [Unit] Description=Helios64 UPS Action [Install] WantedBy=multi-user.target [Service] #Type=oneshot #ExecStart=/usr/bin/helios64-ups.sh Type=simple ExecStart=/usr/local/sbin/powermon.sh  
    Modified powermon.sh:
    #!/bin/bash #7.0V 916 Recommended threshold to force shutdown system TH=916 # Values can be info, warning, emerg warnlevel="emerg" while [ : ] do main_power=$(cat '/sys/class/power_supply/gpio-charger/online') # Only use for testing: # main_power=0 if [ "$main_power" == 0 ]; then val=$(cat '/sys/bus/iio/devices/iio:device0/in_voltage2_raw') sca=$(cat '/sys/bus/iio/devices/iio:device0/in_voltage_scale') # The division is required to make the test later work. adc=$(echo "$val * $sca /1" | bc) echo "Main power lost. Current charge: $adc" | systemd-cat -p $warnlevel echo "Shutdown at $TH" | systemd-cat -p $warnlevel # Uncomment for testing # echo "Current values:" # echo -e "\tMain Power = $main_power" # echo -e "\tRaw Voltage = $val" # echo -e "\tVoltage Scale = $sca" # echo -e "\tVoltage = $adc" if [ "$adc" -le $TH ]; then echo "Critical power level reached. Powering off." | systemd-cat -p $warnlevel /usr/sbin/poweroff fi fi sleep 20 done  
  16. Like
    clostro reacted to wurmfood in Migrate from ramlog to disk   
    Well, for anyone else interested in trying this, here's the basic order I did:
    stop armbian-ramlog disable armbian-ramlog create a zfs dataset and mount it at /var/log cp -ar everything from /var/log.hdd to the new /var/log modify /etc/logrotate to disable compression (since the dataset is already using compression) modify /etc/default/armbian-ramlog to disable it there as well modify /etc/default/armbian-zram-config to adjust for new numbers (I have ZRAM_PERCENTAGE and MEM_LIMIT_PERCENTAGE at 15). reboot
  17. Like
    clostro got a reaction from gprovost in How to do a full hardware test?   
    May I suggest outputting dmesg live to a network location?
    I'm not sure if the serial console output is the same as 'dmesg' but if it is, you can live 'nohup &' it to any file. That way you wouldn't have to keep connected to console or ssh all the time. Just don't output it to any local file system as writing to a local file system at a crash might corrupt it and cause more problems.
     
    nohup dmesg --follow > /network/location/folder/helios64-log.txt & 2>&1
    exit
     
    needed to have single >, and exit the session with 'exit' apparently..
  18. Like
    clostro reacted to wurmfood in Does anyone actually have a stable system?   
    Ah! Sorry, yes, you're right. I've been looking at these back and forth for the last few days. A lot of people here have been having trouble with WD drives and at least some of that seems to come down to the difference between the two types of drives.
  19. Like
    clostro reacted to dieKatze88 in Feature / Changes requests for future Helios64 board or enclosure revisions   
    How about a backplane instead of a nest of wires?
  20. Like
    clostro reacted to gprovost in Feature / Changes requests for future Helios64 board or enclosure revisions   
    @dieKatze88 Yes this is already been announced here and there that we will replace the wire harness by a proper PCB backplane. There will still be wire tough connecting the main board to the backplane since we don't want a board that can only be used with a specific backplane. But these wires will be normal SATA cables, so easy to buy new ones anywhere if replacement is needed.
  21. Like
    clostro reacted to scottf007 in ZFS or normal Raid   
    Hi All, 

    I am on my Helios64 (already have a Helios4). 

    I am running 2 x 8TB, 1x 256 M2, OMV5 debian etc.
     
    I have ZFS running (from the wiki instructions) in a basic mirror format, and if I add a few drives I think it would still be a mirror, however I have the following question. It seems far more complicated than MDADM, and potentially more unstable (not in terms of the underlying technology, but in terms of changing these kernals etc every week). 

    Do you think for an average home user that this is really worth it or will constantly need work? To change it would mean starting again, but I do not want to change the setup every month and would intend to keep it stable for media, and backing up some stuff for the next few years. Or if it is working it should be fine for the long term?
     
    Cheers
    Scott
  22. Like
    clostro reacted to ShadowDance in Helios64 - freeze whatever the kernel is.   
    @jbergler I recently noticed the armbian-hardware-optimization script for Helios64 changes the IO scheduler to `bfq` for spinning disks, however, for ZFS we should be using `none` because it has it's own scheduler. Normally ZFS would change the scheduler itself, but that would only happen if you're using raw disks (not partitions) and if you import the zpool _after_ the hardware optimization script has run.
     
    You can try changing it (e.g. `echo none >/sys/block/sda/queue/scheduler`) for each ZFS disk and see if anything changes. I still haven't figured out if this is a cause for any problems, but it's worth a shot.
  23. Like
    clostro got a reaction from Werner in Hardrives newbie question   
    @kelsoAny SSD will be fast enough for the network or USB speeds of this device. If you are buying new you can pick WD red, Samsung Evo, Sandisk Ultra/Extreme, Seagate firepro (?) ... just stay away from little or no known brands. You can check the models you picked and compare them here - https://ssd.userbenchmark.com/
    They are getting a well deserved grilling for their CPU comparisons but I think their SSD data is good enough. I would be looking for the best 'Mixed' value for performance for use in this device, as the network and USB speeds are capping the max read or write speed anyhow.
     
    Western Digitals you picked use CMR, which is supposedly much better than SMR, can take a look at this table if you have to pick other models. https://nascompares.com/answer/list-of-wd-cmr-and-smr-hard-drives-hdd/
     
    One suggestion before you put any critical data into those brand new disks- Run 'smart' tests on each drive, long tests. Should take about 10 hours I think. 
     
    One of my brand new Seagates couldn't complete a test at first run and had to be replaced. Now I'm on edge and running nightly borg backups to an older NAS because the other disks are from the same series. Call me paranoid but I usually stagger HDD purchases by a few days in between, and/or order from different stores to avoid having them from the same production batch, couldn't do that this time around.
     
     
    @WernerI use 4 4TB Seagate NAS drives, whatever their branding was.. And an old 240GB Sandisk Extreme SSD for Raid caching. LVM Raid 5 + dm-cache(read and write, writeback, smq). It ain't bad. SSD really picks up the slack of the spinning rust especially when you are pushing random writes to the device, and the smq is pretty smart at read caching for hot files.
  24. Like
    clostro reacted to slymanjojo in Helios64 - freeze whatever the kernel is.   
    @gprovost
    Been Running stable since  21.02.3 Buster.
    cat /etc/default/cpufrequtils
    ENABLE=true
    MIN_SPEED=408000
    MAX_SPEED=1800000
    GOVERNOR=ondemand
     
    No VDD tweaks.
     

     
  25. Like
    clostro reacted to Seneca in Helios64 - freeze whatever the kernel is.   
    Just an update from yesterday, no freezes or crashes yet, even though quite heavy IO and CPU.