ShadowDance

  • Posts

    61
  • Joined

Reputation Activity

  1. Like
    ShadowDance reacted to Gareth Halfacree in Upgrading to Bullseye (troubleshooting Armbian 21.08.1)   
    As I can see, which is why I am making sure to move away from Armbian - and I recommend anyone reading this to do the same. It's simply not a project you can rely upon for daily use. For messing around with as a hobbyist, sure, why not. But to actually rely upon? Impossible.
     
    Especially when the founder doesn't understand that bug reports - especially bug reports from beta/nightly users, which have been specifically requested right here in this thread - have value to the project. With an attitude like that, Armbian will never become anything more than a hobbyist's curiosity.
     
    I'll repeat for clarity, so others reading this thread don't misunderstand my meaning: this is going to keep happening. Future updates will continue to break the Helios64, and other devices, and there is nothing you can do about it. Armbian, as Igor says, simply doesn't have the resources to properly test its updates before release, much less actually respond to bug reports. You will not be able to rely on it to keep your Helios64 running and your data safe. If you're happy with that, by all means continue to use it; otherwise, I'd advise looking into alternative software and/or moving to a new NAS altogether.
  2. Like
    ShadowDance reacted to Gareth Halfacree in Upgrading to Bullseye (troubleshooting Armbian 21.08.1)   
    Oh, I've tried reporting problems. Igor told me (and many, many others) that if I wasn't paying €50 a month to Armbian he wasn't interested, so I stopped.
  3. Like
    ShadowDance reacted to digwer in Kernel panic: Fatal exception in interrupt   
    Hi,

    Just got my Helios64 crashed: red error led is blinking.
    I had a raspberry pi left logging on the USB-C and caught the crash:
     
    [567513.689265] rk_gmac-dwmac fe300000.ethernet eth0: Link is Down [567519.833508] rk_gmac-dwmac fe300000.ethernet eth0: Link is Up - 100Mbps/Full - flow control off [567828.048254] rk_gmac-dwmac fe300000.ethernet eth0: Link is Down [567834.198593] rk_gmac-dwmac fe300000.ethernet eth0: Link is Up - 100Mbps/Full - flow control off [568870.455100] rk_gmac-dwmac fe300000.ethernet eth0: Link is Down [568876.599602] rk_gmac-dwmac fe300000.ethernet eth0: Link is Up - 100Mbps/Full - flow control off [647513.282616] Unable to handle kernel paging request at virtual address 00078000118b99f0 [647513.283329] Mem abort info: [647513.283584] ESR = 0x96000004 [647513.283863] EC = 0x25: DABT (current EL), IL = 32 bits [647513.284336] SET = 0, FnV = 0 [647513.284615] EA = 0, S1PTW = 0 [647513.284899] Data abort info: [647513.285161] ISV = 0, ISS = 0x00000004 [647513.285506] CM = 0, WnR = 0 [647513.285776] [00078000118b99f0] address between user and kernel address ranges [647513.286411] Internal error: Oops: 96000004 [#1] PREEMPT SMP [647513.286909] Modules linked in: iptable_nat iptable_filter bpfilter wireguard libchacha20poly1305 poly1305_neon ip6_udp_tunnel udp_tunnel libblake2s libcurve25519_generic libblake2s_generic veth nf_conntrack_netlink xfrm_user xfrm_algo br_netfilter bridge aufs ipt_REJECT nf_reject_ipv4 rfkill governor_performance n ft_chain_nat xt_nat xt_MASQUERADE nf_nat xt_addrtype nft_counter zram xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink r8152 snd_soc_hdmi_codec snd_soc_rockchip_i2s hantro_vpu(C) rockchipdrm rockchip_vdec(C) snd_soc_core dw_mipi_dsi v4l2_h264 dw_hdmi snd_pcm_dmaengine ro ckchip_rga videobuf2_dma_contig analogix_dp snd_pcm pwm_fan videobuf2_dma_sg v4l2_mem2mem snd_timer gpio_charger videobuf2_vmalloc leds_pwm snd panfrost videobuf2_memops fusb302 drm_kms_helper videobuf2_v4l2 tcpm soundcore gpu_sched videobuf2_common cec typec rc_core videodev sg drm mc drm_panel_orientation_quirks gpi o_beeper cpufreq_dt ledtrig_netdev lm75 ip_tables [647513.287063] x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear md_mod realtek dwmac_rk stmmac_platform stmmac pcs_xpcs adc_keys [647513.296230] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G C 5.10.35-rockchip64 #21.05.1 [647513.297012] Hardware name: Helios64 (DT) [647513.297367] pstate: 60000085 (nZCv daIf -PAN -UAO -TCO BTYPE=--) [647513.297911] pc : scheduler_tick+0xc4/0x140 [647513.298279] lr : scheduler_tick+0xc4/0x140 [647513.298647] sp : ffff800011c13d90 [647513.298947] x29: ffff800011c13d90 x28: 00024ced30605580 [647513.299424] x27: ffff0000f77bb6c0 x26: 0000000000000000 [647513.299900] x25: 0000000000000080 x24: ffff80001156a000 [647513.300375] x23: ffff000000711d00 x22: ffff80001157fd00 [647513.300851] x21: 0000000000000005 x20: ffff8000118b99c8 [647513.301327] x19: ffff0000f77c7d00 x18: 0000000000000610 [647513.301803] x17: 0000000000000010 x16: 0000000000000000 [647513.302280] x15: 0000000000000006 x14: 0000000000000000 [647513.302756] x13: 0000000000000095 x12: 0000000000000000 [647513.303231] x11: 0000000000000000 x10: 0000000000000004 [647513.303707] x9 : 0000000000000095 x8 : 0000000000000000 [647513.304184] x7 : ffff0000f77c7d00 x6 : ffff0000f77c8800 [647513.304659] x5 : 0000000000001095 x4 : ffff8000e6248000 [647513.305135] x3 : 0000000000010001 x2 : ffff80001156a000 [647513.305612] x1 : ffff8000112a1c88 x0 : 0000000000000005 [647513.306088] Call trace: [647513.306314] scheduler_tick+0xc4/0x140 [647513.306658] update_process_times+0x8c/0xa0 [647513.307035] tick_sched_handle.isra.19+0x40/0x58 [647513.307449] tick_sched_timer+0x58/0xb0 [647513.307795] __hrtimer_run_queues+0x104/0x388 [647513.308187] hrtimer_interrupt+0xf4/0x250 [647513.308551] arch_timer_handler_phys+0x30/0x40 [647513.308950] handle_percpu_devid_irq+0xa0/0x298 [647513.309357] generic_handle_irq+0x30/0x48 [647513.309718] __handle_domain_irq+0x94/0x108 [647513.310097] gic_handle_irq+0xc0/0x140 [647513.310436] el1_irq+0xc0/0x180 [647513.310724] arch_cpu_idle+0x18/0x28 [647513.311047] default_idle_call+0x44/0x1bc [647513.311409] do_idle+0x204/0x278 [647513.311701] cpu_startup_entry+0x28/0x60 [647513.312056] secondary_start_kernel+0x170/0x180 [647513.312466] Code: 94000cfb aa1303e0 94369a27 940518e0 (f8757a82) [647513.313015] ---[ end trace 2613ef5b92c55060 ]--- [647513.313430] Kernel panic - not syncing: Oops: Fatal exception in interrupt [647513.314040] SMP: stopping secondary CPUs [647513.314403] Kernel Offset: disabled [647513.314718] CPU features: 0x0240022,6100200c [647513.315101] Memory Limit: none [647513.315387] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
    Don't mind the ethernet port up/down messages (the connected router likes to reboot himself).
    The system is running from SD card.
     
    $ uname -a Linux helios64 5.10.35-rockchip64 #21.05.1 SMP PREEMPT Fri May 7 13:53:11 UTC 2021 aarch64 GNU/Linux
    If you need any other information, feel free to ask.

    Thanks.
  4. Like
    ShadowDance got a reaction from meymarce in Very noisy fans   
    I replaced the stock fans with Noctua NF-A8-PWM fans. These have slightly lower CFM (airflow) at 32.5 vs 35 (stock) however they have a high static pressure rating at 2.37 mm H₂O. For me they produce better cooling than the stock fans. Their noise level is also rated at 17.1 dB(A) so you will barely hear them even at full speed.
     
    I also had a pair of NF-R8-PWM (discontinued model) from the old days which I tried. They are very close in CFM at 31.4 but have worse static pressure at 1.41 mm H₂O. These fans produced worse cooling than the stock fans.
     
    One additional change I did was place the metal fan grills (finger protectors) on the outside because then the fan is allowed to produce a seal against the case. I thought it was a small design miss to leave a gap between the case and the fan because it might allow air to move back inside. Aesthetically it's nicer to have the grill on the inside (IMO), so as an alternative fix one could design a 3D printed piece to fill the gaps.
     
    It's also possible to adjust speed by modifying the `/etc/fancontrol` configuration and restarting the service (`systemctl restart fancontrol`), but I would not recommend this unless you're using better than stock fans. If the CPU is working full throttle you will want the fans running at full speed to sufficiently extract the heat from the CPU area.
  5. Like
    ShadowDance got a reaction from clostro in SATA issue, drive resets: ataX.00: failed command: READ FPDMA QUEUED   
    @usefulnoise I'd start by trying all different suggestions in this thread, e.g. limiting speed, disabling ncq, if you're not using raw disks (i.e. partitions or dm-crypt), make sure you've disabled io schedulers on the disks, etc.
     
    Example: libata.force=3.0G,noncq,noncqtrim
     
    Disabling ncqtrim is probably unnecessary, but doesn't give any benefit with spinning disks anyway.
     
    If none of this helps, and you're sure the disks aren't actually faulty, I'd recommend trying the SATA controller firmware update (it didn't help me) or possibly experimenting with removing noise. Hook the PSU to a grounded wall socket, use 3rd party SATA cables, or try rerouting them.
     
    Possibly, if you're desperate, try removing the metal clips from the SATA cables (the clip that hooks into the motherboard socket), it shouldn't be a problem, but could perhaps function as an antenna for noise.
  6. Like
    ShadowDance got a reaction from usefulnoise in SATA issue, drive resets: ataX.00: failed command: READ FPDMA QUEUED   
    @usefulnoise I'd start by trying all different suggestions in this thread, e.g. limiting speed, disabling ncq, if you're not using raw disks (i.e. partitions or dm-crypt), make sure you've disabled io schedulers on the disks, etc.
     
    Example: libata.force=3.0G,noncq,noncqtrim
     
    Disabling ncqtrim is probably unnecessary, but doesn't give any benefit with spinning disks anyway.
     
    If none of this helps, and you're sure the disks aren't actually faulty, I'd recommend trying the SATA controller firmware update (it didn't help me) or possibly experimenting with removing noise. Hook the PSU to a grounded wall socket, use 3rd party SATA cables, or try rerouting them.
     
    Possibly, if you're desperate, try removing the metal clips from the SATA cables (the clip that hooks into the motherboard socket), it shouldn't be a problem, but could perhaps function as an antenna for noise.
  7. Like
    ShadowDance got a reaction from lanefu in SATA issue, drive resets: ataX.00: failed command: READ FPDMA QUEUED   
    @Wofferl those are the exact same model as three of my disks (but mine aren't "Plus"). I've used these disks in another machine with ZFS and zero issues (ASM1062 SATA controller). So if we assume the problem is between SATA controller and disk, and while I agree with you that it's probably in part a disk issue, I'm convinced it's something that would be fixable on the SATA controller firmware. Perhaps these disks do something funny that the SATA controller doesn't expect? And based on all my testing so far, the SATA cable also plays a role, meaning perhaps there's a noise-factor in play (as well).
     
    Side-note; Western Digital really screwed us over with this whole SMR fiasco, didn't they. I'd be pretty much ready to throw these disks in the trash if it wasn't for the fact that they worked perfectly on another SATA controller.
     
    @grek glad it helped! By the way, I would still recommend changing the io scheduler to none because bfq is CPU intensive, and ZFS does it's own scheduling. Probably wont fix issues but might reduce some CPU overhead.
  8. Like
    ShadowDance got a reaction from gprovost in SATA issue, drive resets: ataX.00: failed command: READ FPDMA QUEUED   
    @Wofferl those are the exact same model as three of my disks (but mine aren't "Plus"). I've used these disks in another machine with ZFS and zero issues (ASM1062 SATA controller). So if we assume the problem is between SATA controller and disk, and while I agree with you that it's probably in part a disk issue, I'm convinced it's something that would be fixable on the SATA controller firmware. Perhaps these disks do something funny that the SATA controller doesn't expect? And based on all my testing so far, the SATA cable also plays a role, meaning perhaps there's a noise-factor in play (as well).
     
    Side-note; Western Digital really screwed us over with this whole SMR fiasco, didn't they. I'd be pretty much ready to throw these disks in the trash if it wasn't for the fact that they worked perfectly on another SATA controller.
     
    @grek glad it helped! By the way, I would still recommend changing the io scheduler to none because bfq is CPU intensive, and ZFS does it's own scheduling. Probably wont fix issues but might reduce some CPU overhead.
  9. Like
    ShadowDance got a reaction from Koen Vervloesem in ZFS or normal Raid   
    @scottf007 I think it would be hard for anyone here to really answer if it's worth it or not [for you]. In your situation, I'd try to evaluate whether or not you need the features that ZFS give you. For instance, ZFS snapshots is something you never really need, until you do. When you find that you've deleted some data a month ago and can still recover it from a snapshot, it's a great comfort. If that's something you value, btrfs could be an alternative and is already built into the kernel. If all you need is data integrity, you could consider dm-integrity+mdraid and file system of choice on top (EXT4, XFS, etc.). Skipping "raid" all-together would also be possible, LVM allows for great flexibility with disks.
     
    If you're worried about the amount of work you need to put in with ZFS, you can freeze the updates when you are satisfied with the stability of the system. Just hit `sudo apt-mark hold linux-image-current-rockchip64 linux-dtb-current-rockchip64` which prevents kernel/boot instruction updates and you should not have ZFS break on you any time soon. Conversely, `unhold` once you're ready to deal with the future.
     
    For me personally, ZFS is totally worth it. I have it on two server/NAS at home. I use ZFS native encryption on one, and LUKS+ZFS on the Helios64 (due to CPU capabilities). I also use a tool named zrepl for automatically creating, pruning and replicating snapshots. So for instance, my most important datasets are backed up from my one machine to the Helios64 in raw mode, this means the data is safe, but not readable by the Helios64 without loading the encryption keys. I also run Armbian on the Helios64 straight off of ZFS (root on ZFS), this gives me the ability to easily roll-back the system if, say, an update broke it.
     
    @hartraft depends on your requirements/feature wishlist. RAID (mdraid), for instance, cannot guarantee data consistency (unless stacked with dm-integrity). What this means is that once data is written to the disk, it can still become corrupted and RAID can't catch it. ZFS guards against this via checksums on all data, i.e. once it's on disk, it's guarantee-ably either not corrupted or that corruption will be detected and likely repairable from one of the redundant disks. ZFS also has support for snapshots, meaning you can easily recover deleted files from snapshots, etc. RAID does not support anything like this. Looking at mergerfs, it seems to lack these features as well, and it runs in user-space (via FUSE), so not as integrated. SnapRaid is a backup program so not really comparable and MooseFS I know nothing about, but looks enterprise-y.
     
    The closest match-up for ZFS in terms of features is probably btrfs (in kernel) or bcachefs (have never used this).
  10. Like
    ShadowDance got a reaction from gprovost in ZFS or normal Raid   
    @scottf007 I think it would be hard for anyone here to really answer if it's worth it or not [for you]. In your situation, I'd try to evaluate whether or not you need the features that ZFS give you. For instance, ZFS snapshots is something you never really need, until you do. When you find that you've deleted some data a month ago and can still recover it from a snapshot, it's a great comfort. If that's something you value, btrfs could be an alternative and is already built into the kernel. If all you need is data integrity, you could consider dm-integrity+mdraid and file system of choice on top (EXT4, XFS, etc.). Skipping "raid" all-together would also be possible, LVM allows for great flexibility with disks.
     
    If you're worried about the amount of work you need to put in with ZFS, you can freeze the updates when you are satisfied with the stability of the system. Just hit `sudo apt-mark hold linux-image-current-rockchip64 linux-dtb-current-rockchip64` which prevents kernel/boot instruction updates and you should not have ZFS break on you any time soon. Conversely, `unhold` once you're ready to deal with the future.
     
    For me personally, ZFS is totally worth it. I have it on two server/NAS at home. I use ZFS native encryption on one, and LUKS+ZFS on the Helios64 (due to CPU capabilities). I also use a tool named zrepl for automatically creating, pruning and replicating snapshots. So for instance, my most important datasets are backed up from my one machine to the Helios64 in raw mode, this means the data is safe, but not readable by the Helios64 without loading the encryption keys. I also run Armbian on the Helios64 straight off of ZFS (root on ZFS), this gives me the ability to easily roll-back the system if, say, an update broke it.
     
    @hartraft depends on your requirements/feature wishlist. RAID (mdraid), for instance, cannot guarantee data consistency (unless stacked with dm-integrity). What this means is that once data is written to the disk, it can still become corrupted and RAID can't catch it. ZFS guards against this via checksums on all data, i.e. once it's on disk, it's guarantee-ably either not corrupted or that corruption will be detected and likely repairable from one of the redundant disks. ZFS also has support for snapshots, meaning you can easily recover deleted files from snapshots, etc. RAID does not support anything like this. Looking at mergerfs, it seems to lack these features as well, and it runs in user-space (via FUSE), so not as integrated. SnapRaid is a backup program so not really comparable and MooseFS I know nothing about, but looks enterprise-y.
     
    The closest match-up for ZFS in terms of features is probably btrfs (in kernel) or bcachefs (have never used this).
  11. Like
    ShadowDance reacted to antsu in Crazy instability :(   
    @ShadowDance It's a regular swap partition on sda1, not on a zvol. But thanks for the quick reply.
  12. Like
    ShadowDance got a reaction from SymbiosisSystems in Helios64 - freeze whatever the kernel is.   
    @jbergler I recently noticed the armbian-hardware-optimization script for Helios64 changes the IO scheduler to `bfq` for spinning disks, however, for ZFS we should be using `none` because it has it's own scheduler. Normally ZFS would change the scheduler itself, but that would only happen if you're using raw disks (not partitions) and if you import the zpool _after_ the hardware optimization script has run.
     
    You can try changing it (e.g. `echo none >/sys/block/sda/queue/scheduler`) for each ZFS disk and see if anything changes. I still haven't figured out if this is a cause for any problems, but it's worth a shot.
  13. Like
    ShadowDance got a reaction from antsu in Crazy instability :(   
    @antsu you could try changing the IO scheduler for those ZFS disks (to `none`) and see if it helps, wrote about it here: 
     
  14. Like
    ShadowDance got a reaction from gprovost in Crazy instability :(   
    @antsu you could try changing the IO scheduler for those ZFS disks (to `none`) and see if it helps, wrote about it here: 
     
  15. Like
    ShadowDance got a reaction from gprovost in Helios64 - freeze whatever the kernel is.   
    @jbergler I recently noticed the armbian-hardware-optimization script for Helios64 changes the IO scheduler to `bfq` for spinning disks, however, for ZFS we should be using `none` because it has it's own scheduler. Normally ZFS would change the scheduler itself, but that would only happen if you're using raw disks (not partitions) and if you import the zpool _after_ the hardware optimization script has run.
     
    You can try changing it (e.g. `echo none >/sys/block/sda/queue/scheduler`) for each ZFS disk and see if anything changes. I still haven't figured out if this is a cause for any problems, but it's worth a shot.
  16. Like
    ShadowDance got a reaction from clostro in Helios64 - freeze whatever the kernel is.   
    @jbergler I recently noticed the armbian-hardware-optimization script for Helios64 changes the IO scheduler to `bfq` for spinning disks, however, for ZFS we should be using `none` because it has it's own scheduler. Normally ZFS would change the scheduler itself, but that would only happen if you're using raw disks (not partitions) and if you import the zpool _after_ the hardware optimization script has run.
     
    You can try changing it (e.g. `echo none >/sys/block/sda/queue/scheduler`) for each ZFS disk and see if anything changes. I still haven't figured out if this is a cause for any problems, but it's worth a shot.
  17. Like
    ShadowDance got a reaction from SymbiosisSystems in How-to start fans during early boot   
    If you want the fans of your Helios64 to start spinning earlier, look no further. This will allow the fans to spin at a constant speed from the earliest stage of initrd until fancontrol is started by the system. My use-case is for full disk encryption and recovery in initramfs, sometimes I can have the machine powered on for quite a while before fancontrol starts. Or, if the boot process encounters an error the fans may never start, this prevents that.
     
    Step 1: Tell initramfs to include the pwm-fan module
    echo pwm-fan | sudo tee -a /etc/initramfs-tools/modules  
    Step 2: Create a new file at /etc/initramfs-tools/scripts/init-top/fan with the following contents:
    #!/bin/sh PREREQ="" prereqs() { echo "$PREREQ" } case $1 in prereqs) prereqs exit 0 ;; esac . /scripts/functions modprobe pwm-fan for pwm in /sys/devices/platform/p*-fan/hwmon/hwmon*/pwm1; do echo 150 >$pwm done exit 0 Feel free to change the value 150 anywhere between 1-255, it decides how fast the fans will spin.
     
    Step 3: Enable the fan script for inclusion in initramfs and update the initramfs image:
    sudo chmod +x /etc/initramfs-tools/scripts/init-top/fan sudo update-initramfs -u -k all  
    Step 4: Reboot and ejoy the cool breeze.
  18. Like
    ShadowDance reacted to gprovost in Kernel panic in 5.10.12-rockchip64 21.02.1   
    We are coming up soon with a change that will finally improve DVFS stability.
  19. Like
    ShadowDance got a reaction from gprovost in Encrypted OpenZFS performance   
    Native ZFS encryption speed is not optimal on ARM and is limited by CPU speed on the Helios64. The optimizations that have gone in are limited to amd64 based architectures and require CPU features not available on ARM. Another consideration is that because of the lack of CPU features, the CPU will be heavily loaded during encrypted reads and writes meaning there are less resources available for other tasks. The problem isn't AES though, which is fully supported by the RK3309, it's GCM. This means that you can do full disk encryption via LUKS and run ZFS without encryption on top -- this is what I do. It's the best we can have at the moment and for the foreseeable future, nobody to my knowledge is working on ARM encryption optimizations for ZFS currently.
     
    Edit: This may be of interest: https://github.com/openzfs/zfs/issues/10347
  20. Like
    ShadowDance reacted to griefman in SATA issue, drive resets: ataX.00: failed command: READ FPDMA QUEUED   
    I have been following the discussion for a while and would like to report where i am at. Also I would like to request that this information is being recorded in a systematic way by Kobol, of course only if this isn't already happening. 
     
    I have the following setup:
    pool: mypool state: ONLINE scan: resilvered 12K in 00:00:00 with 0 errors on Tue Feb 9 17:22:03 2021 config: NAME STATE READ WRITE CKSUM mypool ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ata-WDC_WD40EFRX-68N32N0_WD-WCC7XXXXXXXX ONLINE 0 0 0 ata-WDC_WD40EFRX-68N32N0_WD-WCC7XXXXXXXX ONLINE 0 0 0 ata-WDC_WD40EFRX-68N32N0_WD-WCC7XXXXXXXX ONLINE 0 0 0 ata-WDC_WD40EFRX-68N32N0_WD-WCC7XXXXXXXX ONLINE 0 0 0 ata-WDC_WD40EFRX-68N32N0_WD-WCC7XXXXXXXX ONLINE 0 0 0  
    Some weeks ago I noticed that the health of the zpool is DEGRADED. I checked and one device had READ errors and was marked as FAULTED. This has also resulted in storing UDMA CRC errors in the SMART stats for this drive. So, I cleared the errors and ran a scrub to see what is going on.
     
    sudo zpool scrub mypool  
     
    I monitored the scrub with
    sudo watch zpool status  
    And I saw quite quickly that all drives have started to get READ errors. SMART also reported that all drives now have UDMA CRC errors.
     
    It was clear that something bad is going on, so I contacted Kobol and the we started to debug the issue together.
     
    First I changed the SATA speed to 3Gbps by adding the following line to /boot/armbianEnv.txt
    extraargs=libata.force=3.0
     
    The results were similar, but I noticed that the errors started to show up a bit later in the scrubbing process. The UDMA CRC Errors count has increased.
     
    Then I replaced the power and the SATA Cables with new ones, which unfortunately did not bring any improvement.
     
    Then I disabled NCQ by adding the following to /boot/armbianEnv.txt
    extraargs=libata.force=noncq  
    and reverted back to SATA 6 Gbps by removing the 3 Gbps line, introduced earlier.
     
    This had a positive results and I was able to run the scrub without any errors.
     
    Then I went back and installed the old(original) cable harness again and retested - all good.
     
    While disable NCQ is having a positive impact on the errors, it is also having a negative impact on the speed and to some amount also on the disk drives' health. 
     
    I have also tried to reduce NCQs depth to 31, which is the recommended value, however this did not have any impact. 
     
    I hope that using this information Kobol will try to reproduce this issue themselves, to see if its only certain boards that are affected or if its every board. 
     
     
     
     
     
  21. Like
    ShadowDance reacted to gprovost in SATA issue, drive resets: ataX.00: failed command: READ FPDMA QUEUED   
    @ShadowDance @alban I guess it was a shot in the dark to check if NCQ has any sort of impact.
  22. Like
    ShadowDance got a reaction from echolon79 in Very noisy fans   
    I replaced the stock fans with Noctua NF-A8-PWM fans. These have slightly lower CFM (airflow) at 32.5 vs 35 (stock) however they have a high static pressure rating at 2.37 mm H₂O. For me they produce better cooling than the stock fans. Their noise level is also rated at 17.1 dB(A) so you will barely hear them even at full speed.
     
    I also had a pair of NF-R8-PWM (discontinued model) from the old days which I tried. They are very close in CFM at 31.4 but have worse static pressure at 1.41 mm H₂O. These fans produced worse cooling than the stock fans.
     
    One additional change I did was place the metal fan grills (finger protectors) on the outside because then the fan is allowed to produce a seal against the case. I thought it was a small design miss to leave a gap between the case and the fan because it might allow air to move back inside. Aesthetically it's nicer to have the grill on the inside (IMO), so as an alternative fix one could design a 3D printed piece to fill the gaps.
     
    It's also possible to adjust speed by modifying the `/etc/fancontrol` configuration and restarting the service (`systemctl restart fancontrol`), but I would not recommend this unless you're using better than stock fans. If the CPU is working full throttle you will want the fans running at full speed to sufficiently extract the heat from the CPU area.
  23. Like
    ShadowDance got a reaction from gprovost in SATA issue, drive resets: ataX.00: failed command: READ FPDMA QUEUED   
    Hey, sorry I haven't updated this thread until now.
     
    The Kobol team sent me, as promised, a new harness and a power-only harness so that I could do some testing:
    Cutting off capacitors from the my original harness did not make a difference The new (normal) harness had the exact same issue as the original one With the power-only harness and my own SATA cables, I was unable to reproduce the issue (even at 6 Gbps) Final test was to go to town on my original harness and cut the connector in two, this allowed me to use my own SATA cable with the original harness and there was, again, no issue (at 6 Gbps) Judging from my initial results, it would seem that there is an issue with the SATA cables in the stock harness. But I should try to do this for a longer period of time -- problem was I didn't have SATA cables for all disks, once I do I'll try to do a week long stress test. I reported my result to the Kobol team but haven't heard back yet.
     
    Even with the 3.0 Gbps limit, I still occasionally run into this issue with the original harness, has happened 2 times since I did the experiment.
     
    If someone else is willing to repeat this experiment with a good set of SATA cables, please do contact Kobol to see if they'd be willing to ship out another set of test harnesses, or perhaps they have other plans.
     
    Here's some pics of my test setup, including the mutilated connector:
     

  24. Like
    ShadowDance got a reaction from aprayoga in SATA issue, drive resets: ataX.00: failed command: READ FPDMA QUEUED   
    Hey, sorry I haven't updated this thread until now.
     
    The Kobol team sent me, as promised, a new harness and a power-only harness so that I could do some testing:
    Cutting off capacitors from the my original harness did not make a difference The new (normal) harness had the exact same issue as the original one With the power-only harness and my own SATA cables, I was unable to reproduce the issue (even at 6 Gbps) Final test was to go to town on my original harness and cut the connector in two, this allowed me to use my own SATA cable with the original harness and there was, again, no issue (at 6 Gbps) Judging from my initial results, it would seem that there is an issue with the SATA cables in the stock harness. But I should try to do this for a longer period of time -- problem was I didn't have SATA cables for all disks, once I do I'll try to do a week long stress test. I reported my result to the Kobol team but haven't heard back yet.
     
    Even with the 3.0 Gbps limit, I still occasionally run into this issue with the original harness, has happened 2 times since I did the experiment.
     
    If someone else is willing to repeat this experiment with a good set of SATA cables, please do contact Kobol to see if they'd be willing to ship out another set of test harnesses, or perhaps they have other plans.
     
    Here's some pics of my test setup, including the mutilated connector:
     

  25. Like
    ShadowDance got a reaction from clostro in SATA issue, drive resets: ataX.00: failed command: READ FPDMA QUEUED   
    Hey, sorry I haven't updated this thread until now.
     
    The Kobol team sent me, as promised, a new harness and a power-only harness so that I could do some testing:
    Cutting off capacitors from the my original harness did not make a difference The new (normal) harness had the exact same issue as the original one With the power-only harness and my own SATA cables, I was unable to reproduce the issue (even at 6 Gbps) Final test was to go to town on my original harness and cut the connector in two, this allowed me to use my own SATA cable with the original harness and there was, again, no issue (at 6 Gbps) Judging from my initial results, it would seem that there is an issue with the SATA cables in the stock harness. But I should try to do this for a longer period of time -- problem was I didn't have SATA cables for all disks, once I do I'll try to do a week long stress test. I reported my result to the Kobol team but haven't heard back yet.
     
    Even with the 3.0 Gbps limit, I still occasionally run into this issue with the original harness, has happened 2 times since I did the experiment.
     
    If someone else is willing to repeat this experiment with a good set of SATA cables, please do contact Kobol to see if they'd be willing to ship out another set of test harnesses, or perhaps they have other plans.
     
    Here's some pics of my test setup, including the mutilated connector: