SATA issue, drive resets: ataX.00: failed command: READ FPDMA QUEUED


ShadowDance
 Share

12 12

Recommended Posts

I have some new things to report.

 

I finally got around to replacing all SATA cables with 3rd party cables and using the power-only harness that the team at Kobol sent me. So I went ahead and re-enabled 6 Gbps but to my disappointment I ran into failed command: READ FPDMA QUEUED errors again. Because of this I tried once again to disable NCQ (extraargs=libata.force=noncq) and lo-and-behold, I didn't run into disk issues for the 1+ week I was testing it (kernel 5.9.14). This made me hopeful that maybe, just maybe, my previous test with NCQ disabled was bad so I re-installed the new standard harness I received from Kobol but unfortunately, I started immediately having issues again.

 

Note that this is with io scheduler set to none.

 

ata4.00: failed command: READ DMA EXT

Spoiler

[  173.700955] ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x300100 action 0x6
[  173.701578] ata4.00: irq_stat 0x08000000
[  173.701928] ata4: SError: { UnrecovData Dispar BadCRC }
[  173.702391] ata4.00: failed command: READ DMA EXT
[  173.702814] ata4.00: cmd 25/00:38:d8:66:ba/00:00:f9:01:00/e0 tag 22 dma 28672 in
                        res 50/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x10 (ATA bus error)
[  173.704154] ata4.00: status: { DRDY }
[  173.704491] ata4: hard resetting link
[  174.184891] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  174.186685] ata4.00: configured for UDMA/133
[  174.187111] sd 3:0:0:0: [sdd] tag#22 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[  174.187920] sd 3:0:0:0: [sdd] tag#22 Sense Key : 0x5 [current]
[  174.188440] sd 3:0:0:0: [sdd] tag#22 ASC=0x21 ASCQ=0x4
[  174.188922] sd 3:0:0:0: [sdd] tag#22 CDB: opcode=0x88 88 00 00 00 00 01 f9 ba 66 d8 00 00 00 38 00 00
[  174.189732] blk_update_request: I/O error, dev sdd, sector 8484710104 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
[  174.190695] zio pool=rpool vdev=/dev/disk/by-id/dm-uuid-CRYPT-LUKS2-356ca7bf038a4fda94bf84e9abfabb4c-luks-ata-WDC_WD60EFRX-68L0BN1_WD-WX51D583KTSE-part4 error=5 type=1 offset=4343080005632 size=28672 flags=180880
[  174.192407] ata4: EH complete
[  177.185563] kmem.limit_in_bytes is deprecated and will be removed. Please report your usecase to linux-mm@kvack.org if you depend on this functionality.
[  177.187069] cgroup: cgroup: disabling cgroup2 socket matching due to net_prio or net_cls activation
[  178.311447] eth0: renamed from veth863bc6b
[  178.349209] eth0: renamed from vethb8208e6
[  178.370851] IPv6: ADDRCONF(NETDEV_CHANGE): veth6d9db95: link becomes ready
[  178.371722] br-429aece1ddc2: port 1(veth6d9db95) entered blocking state
[  178.372324] br-429aece1ddc2: port 1(veth6d9db95) entered forwarding state
[  178.373185] IPv6: ADDRCONF(NETDEV_CHANGE): veth8810431: link becomes ready
[  178.373987] br-429aece1ddc2: port 2(veth8810431) entered blocking state
[  178.374576] br-429aece1ddc2: port 2(veth8810431) entered forwarding state
[  178.375585] IPv6: ADDRCONF(NETDEV_CHANGE): br-429aece1ddc2: link becomes ready
[  301.509646] perf: interrupt took too long (2511 > 2500), lowering kernel.perf_event_max_sample_rate to 79500
[  331.652485] perf: interrupt took too long (3168 > 3138), lowering kernel.perf_event_max_sample_rate to 63000
[  350.695350] perf: interrupt took too long (3986 > 3960), lowering kernel.perf_event_max_sample_rate to 50000
[  352.017716] ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x300100 action 0x6
[  352.018339] ata4.00: irq_stat 0x08000000
[  352.018688] ata4: SError: { UnrecovData Dispar BadCRC }
[  352.019149] ata4.00: failed command: READ DMA EXT
[  352.019571] ata4.00: cmd 25/00:b0:e8:1a:6c/00:07:75:02:00/e0 tag 18 dma 1007616 in
                        res 50/00:00:88:16:6c/00:00:75:02:00/e0 Emask 0x10 (ATA bus error)
[  352.020923] ata4.00: status: { DRDY }
[  352.021255] ata4: hard resetting link
[  352.497484] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  352.499169] ata4.00: configured for UDMA/133
[  352.499721] sd 3:0:0:0: [sdd] tag#18 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[  352.500539] sd 3:0:0:0: [sdd] tag#18 Sense Key : 0x5 [current]
[  352.501057] sd 3:0:0:0: [sdd] tag#18 ASC=0x21 ASCQ=0x4
[  352.501548] sd 3:0:0:0: [sdd] tag#18 CDB: opcode=0x88 88 00 00 00 00 02 75 6c 1a e8 00 00 07 b0 00 00
[  352.502357] blk_update_request: I/O error, dev sdd, sector 10559953640 op 0x0:(READ) flags 0x700 phys_seg 98 prio class 0
[  352.503327] zio pool=rpool vdev=/dev/disk/by-id/dm-uuid-CRYPT-LUKS2-356ca7bf038a4fda94bf84e9abfabb4c-luks-ata-WDC_WD60EFRX-68L0BN1_WD-WX51D583KTSE-part4 error=5 type=1 offset=5405604696064 size=1007616 flags=40080cb0
[  352.505047] ata4: EH complete
[  365.833358] ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x300100 action 0x6
[  365.834109] ata4.00: irq_stat 0x08000000
[  365.834454] ata4: SError: { UnrecovData Dispar BadCRC }
[  365.834913] ata4.00: failed command: READ DMA EXT
[  365.835329] ata4.00: cmd 25/00:b0:b8:07:23/00:01:76:02:00/e0 tag 2 dma 221184 in
                        res 50/00:00:b8:07:23/00:00:76:02:00/e0 Emask 0x10 (ATA bus error)
[  365.836795] ata4.00: status: { DRDY }
[  365.837123] ata4: hard resetting link
[  366.317315] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  366.319003] ata4.00: configured for UDMA/133
[  366.319433] sd 3:0:0:0: [sdd] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[  366.320230] sd 3:0:0:0: [sdd] tag#2 Sense Key : 0x5 [current]
[  366.320741] sd 3:0:0:0: [sdd] tag#2 ASC=0x21 ASCQ=0x4
[  366.321193] sd 3:0:0:0: [sdd] tag#2 CDB: opcode=0x88 88 00 00 00 00 02 76 23 07 b8 00 00 01 b0 00 00
[  366.322009] blk_update_request: I/O error, dev sdd, sector 10571941816 op 0x0:(READ) flags 0x700 phys_seg 23 prio class 0
[  366.322978] zio pool=rpool vdev=/dev/disk/by-id/dm-uuid-CRYPT-LUKS2-356ca7bf038a4fda94bf84e9abfabb4c-luks-ata-WDC_WD60EFRX-68L0BN1_WD-WX51D583KTSE-part4 error=5 type=1 offset=5411742642176 size=221184 flags=40080cb0
[  366.324676] ata4: EH complete
[  369.905366] ata4: limiting SATA link speed to 3.0 Gbps
[  369.905887] ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x300100 action 0x6
[  369.906499] ata4.00: irq_stat 0x08000000
[  369.906849] ata4: SError: { UnrecovData Dispar BadCRC }
[  369.907312] ata4.00: failed command: READ DMA EXT
[  369.907737] ata4.00: cmd 25/00:c0:e0:ae:54/00:07:89:02:00/e0 tag 20 dma 1015808 in
                        res 50/00:00:20:a7:54/00:00:89:02:00/e0 Emask 0x10 (ATA bus error)
[  369.909093] ata4.00: status: { DRDY }
[  369.909507] ata4: hard resetting link
[  370.385268] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  370.386955] ata4.00: configured for UDMA/133
[  370.387493] sd 3:0:0:0: [sdd] tag#20 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[  370.388298] sd 3:0:0:0: [sdd] tag#20 Sense Key : 0x5 [current]
[  370.388816] sd 3:0:0:0: [sdd] tag#20 ASC=0x21 ASCQ=0x4
[  370.389294] sd 3:0:0:0: [sdd] tag#20 CDB: opcode=0x88 88 00 00 00 00 02 89 54 ae e0 00 00 07 c0 00 00
[  370.390098] blk_update_request: I/O error, dev sdd, sector 10893962976 op 0x0:(READ) flags 0x700 phys_seg 88 prio class 0
[  370.391066] zio pool=rpool vdev=/dev/disk/by-id/dm-uuid-CRYPT-LUKS2-356ca7bf038a4fda94bf84e9abfabb4c-luks-ata-WDC_WD60EFRX-68L0BN1_WD-WX51D583KTSE-part4 error=5 type=1 offset=5576617476096 size=1015808 flags=40080cb0
[  370.392770] ata4: EH complete
[  379.682552] perf: interrupt took too long (4987 > 4982), lowering kernel.perf_event_max_sample_rate to 40000
[  411.315139] perf: interrupt took too long (6282 > 6233), lowering kernel.perf_event_max_sample_rate to 31750
[  450.954583] perf: interrupt took too long (7858 > 7852), lowering kernel.perf_event_max_sample_rate to 25250
[  505.281348] perf: interrupt took too long (9867 > 9822), lowering kernel.perf_event_max_sample_rate to 20250
[  681.003734] perf: interrupt took too long (12378 > 12333), lowering kernel.perf_event_max_sample_rate to 16000

 

 

@gprovost are you any closer to figuring out what could be wrong here or have a potential fix in the pipeline?

 

@grek I would recommend adding extraargs=libata.force=noncq to /boot/armbianEnv.txt and see if it helps. Might not completely fix the issue but could make more stable.

Link to post
Share on other sites

Donate and support the project!

I have the same issue with three 6TB WD Red Plus. There is no problem until zfs scrub runs. Maybe it is not a sata bus problem, but a disk problem. All reports here seems to use WD Red disks and you can find similar reports for this disks and zfs with other hardware.

I will test my three disks in an other hardware this weekend to check if it is a helios problem or a disk problem.

 

Model Family:     Western Digital Red
Device Model:     WDC WD60EFRX-68L0BN1
Firmware Version: 82.00A82

 

[363042.880705] ata2.00: exception Emask 0x2 SAct 0x100060 SErr 0x400 action 0x6
[363042.880715] ata2.00: irq_stat 0x08000000
[363042.880722] ata2: SError: { Proto }
[363042.880730] ata2.00: failed command: READ FPDMA QUEUED
[363042.880741] ata2.00: cmd 60/80:28:20:14:60/00:00:0b:00:00/40 tag 5 ncq dma 65536 in
                         res 40/00:28:20:14:60/00:00:0b:00:00/40 Emask 0x2 (HSM violation)
[363042.880746] ata2.00: status: { DRDY }
[363042.880751] ata2.00: failed command: READ FPDMA QUEUED
[363042.880761] ata2.00: cmd 60/00:30:a0:14:60/01:00:0b:00:00/40 tag 6 ncq dma 131072 in
                         res 40/00:28:20:14:60/00:00:0b:00:00/40 Emask 0x2 (HSM violation)
[363042.880765] ata2.00: status: { DRDY }
[363042.880770] ata2.00: failed command: READ FPDMA QUEUED
[363042.880779] ata2.00: cmd 60/80:a0:a0:15:60/00:00:0b:00:00/40 tag 20 ncq dma 65536 in
                         res 40/00:28:20:14:60/00:00:0b:00:00/40 Emask 0x2 (HSM violation)
[363042.880783] ata2.00: status: { DRDY }
[363042.880793] ata2: hard resetting link
[363043.356670] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[363043.357984] ata2.00: configured for UDMA/133
[363043.358054] sd 1:0:0:0: [sdb] tag#5 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[363043.358062] sd 1:0:0:0: [sdb] tag#5 Sense Key : 0x5 [current] 
[363043.358068] sd 1:0:0:0: [sdb] tag#5 ASC=0x21 ASCQ=0x4 
[363043.358076] sd 1:0:0:0: [sdb] tag#5 CDB: opcode=0x88 88 00 00 00 00 00 0b 60 14 20 00 00 00 80 00 00
[363043.358084] blk_update_request: I/O error, dev sdb, sector 190845984 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
[363043.358098] zio pool=datastore vdev=/dev/disk/by-id/ata-WDC_WD60EFRX-68L0BN1_WD-WX72D507JSP5-part1 error=5 type=1 offset=97712095232 size=65536 flags=1808b0
[363043.358145] sd 1:0:0:0: [sdb] tag#6 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[363043.358152] sd 1:0:0:0: [sdb] tag#6 Sense Key : 0x5 [current] 
[363043.358157] sd 1:0:0:0: [sdb] tag#6 ASC=0x21 ASCQ=0x4 
[363043.358163] sd 1:0:0:0: [sdb] tag#6 CDB: opcode=0x88 88 00 00 00 00 00 0b 60 14 a0 00 00 01 00 00 00
[363043.358170] blk_update_request: I/O error, dev sdb, sector 190846112 op 0x0:(READ) flags 0x700 phys_seg 2 prio class 0
[363043.358178] zio pool=datastore vdev=/dev/disk/by-id/ata-WDC_WD60EFRX-68L0BN1_WD-WX72D507JSP5-part1 error=5 type=1 offset=97712160768 size=131072 flags=40080cb0
[363043.358214] sd 1:0:0:0: [sdb] tag#20 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[363043.358220] sd 1:0:0:0: [sdb] tag#20 Sense Key : 0x5 [current] 
[363043.358226] sd 1:0:0:0: [sdb] tag#20 ASC=0x21 ASCQ=0x4 
[363043.358232] sd 1:0:0:0: [sdb] tag#20 CDB: opcode=0x88 88 00 00 00 00 00 0b 60 15 a0 00 00 00 80 00 00
[363043.358239] blk_update_request: I/O error, dev sdb, sector 190846368 op 0x0:(READ) flags 0x700 phys_seg 16 prio class 0
[363043.358248] zio pool=datastore vdev=/dev/disk/by-id/ata-WDC_WD60EFRX-68L0BN1_WD-WX72D507JSP5-part1 error=5 type=1 offset=97712291840 size=65536 flags=1808b0
[363043.358275] ata2: EH complete

 

Link to post
Share on other sites

3 hours ago, Wofferl said:

All reports here seems to use WD Red disks and you can find similar reports for this disks and zfs with other hardware.

Taking a look around, this definitely seems to be the case:

https://www.truenas.com/community/threads/warning-wd-red-wd60efrx-68l0bn1.58676/

 

Additionally, people see a lot more errors with the *EFAX drives instead of the *EFRX, since the *EFAX drives are CMR and don't hold up.

Link to post
Share on other sites

Thanks @ShadowDance for the tip, my scub test done without errors

 

Quote

root@helios64:~# zpool status -v data
  pool: data
 state: ONLINE
  scan: scrub repaired 0B in 07:47:07 with 0 errors on Thu Mar 25 03:55:37 2021
config:

    NAME           STATE     READ WRITE CKSUM
    data           ONLINE       0     0     0
      mirror-0     ONLINE       0     0     0
        sda-crypt  ONLINE       0     0     0
        sdb-crypt  ONLINE       0     0     0
    cache
      sdc2         ONLINE       0     0     0
 

 

Quote

root@helios64:~# cat /boot/armbianEnv.txt 
verbosity=7
console=serial
extraargs=earlyprintk ignore_loglevel libata.force=noncq
 

 

Quote

root@helios64:~# cat /etc/default/cpufrequtils 
ENABLE="true"
GOVERNOR=performance
MAX_SPEED=1416000
MIN_SPEED=1416000
 

 

 

Quote

root@helios64:~# cat /sys/block/sda/queue/scheduler 
mq-deadline kyber [bfq] none
root@helios64:~# cat /sys/block/sdb/queue/scheduler 
mq-deadline kyber [bfq] none
root@helios64:~# 
 

 

 

Quote

root@helios64:~# hdparm -I /dev/sda

/dev/sda:

ATA device, with non-removable media
    Model Number:       WDC WD40EFRX-68N32N0                    
    Firmware Revision:  82.00A82
 

root@helios64:~# hdparm -I /dev/sdb

/dev/sdb:

ATA device, with non-removable media
    Model Number:       WDC WD40EFRX-68N32N0                    
    Firmware Revision:  82.00A82
 

 

Link to post
Share on other sites

@Wofferl those are the exact same model as three of my disks (but mine aren't "Plus"). I've used these disks in another machine with ZFS and zero issues (ASM1062 SATA controller). So if we assume the problem is between SATA controller and disk, and while I agree with you that it's probably in part a disk issue, I'm convinced it's something that would be fixable on the SATA controller firmware. Perhaps these disks do something funny that the SATA controller doesn't expect? And based on all my testing so far, the SATA cable also plays a role, meaning perhaps there's a noise-factor in play (as well).

 

Side-note; Western Digital really screwed us over with this whole SMR fiasco, didn't they. I'd be pretty much ready to throw these disks in the trash if it wasn't for the fact that they worked perfectly on another SATA controller.

 

@grek glad it helped! By the way, I would still recommend changing the io scheduler to none because bfq is CPU intensive, and ZFS does it's own scheduling. Probably wont fix issues but might reduce some CPU overhead.

Link to post
Share on other sites

We will provide soon instruction on how to update SATA controller (JMB585) firmware with the latest version. Let see if it has positive impact or not.

 

14 hours ago, ShadowDance said:

perhaps there's a noise-factor in play (as well).

For the new revision we will reinforce noise filtering on the SATA controller in case this is part of the root cause in some cases.

Link to post
Share on other sites

4 hours ago, gprovost said:

We will provide soon instruction on how to update SATA controller (JMB585) firmware with the latest version. Let see if it has positive impact or not.

 

That's great news, looking forward to it!

 

4 hours ago, gprovost said:

For the new revision we will reinforce noise filtering on the SATA controller in case this is part of the root cause in some cases.

 

Also nice! If you do figure out a trick we can implement ourselves (i.e. via soldering or, dare I say, foil wrapping) to reduce the noise there, let us know .

 

Speaking of noise, I have two new thoughts:

  • My Helios64 PSU is not connected to a grounded wall-outlet. Unfortunately here in my apartment there are only two such outlets in very inconvenient places, but I'll see if it's feasible to test it out on that one. Perhaps this introduces noise into the system? Perhaps other devices are leaking their noise via grounded extension cord (I recently learned that using a grounded extension cord on a non-grounded outlet is not ideal, hadn't even thought to consider it before..)
  • I read somewhere that the metal frame of the HDD cases should be connected to device ground, but since we're using plastic mount brackets I doubt this is the case? To be honest I don't know if this is a real issue or not, just something I read on a forum.
Link to post
Share on other sites

Guys, those drives are CMR (ones I could pick out from the logs posted), not SMR. EFRX is CMR, EFAX is SMR. 

Also, SMR is the one to avoid at all costs.

 

From https://www.servethehome.com/buyers-guides/top-hardware-components-freenas-nas-servers/top-picks-freenas-hard-drives/, ServeTheHome runs a lot of ZFS pools in their infra.

Quote

As a quick note. We do not recommend the new WD Red SMR drives. See WD Red SMR vs CMR Tested Avoid Red SMR.

 

Please refer to manufacturer spec sheets and tables before buying a HDD. Here are a few lists-

https://nascompares.com/answer/list-of-wd-cmr-and-smr-hard-drives-hdd/

https://blog.westerndigital.com/wd-red-nas-drives/

https://www.seagate.com/internal-hard-drives/cmr-smr-list/

 

As far as I know, all Hitachi drives are CMR.

Link to post
Share on other sites

6 minutes ago, clostro said:

Guys, those drives are CMR (ones I could pick out from the logs posted), not SMR. EFRX is CMR, EFAX is SMR. 

Also, SMR is the one to avoid at all costs.

 

From https://www.servethehome.com/buyers-guides/top-hardware-components-freenas-nas-servers/top-picks-freenas-hard-drives/, ServeTheHome runs a lot of ZFS pools in their infra.

 

Please refer to manufacturer spec sheets and tables before buying a HDD. Here are a few lists-

https://nascompares.com/answer/list-of-wd-cmr-and-smr-hard-drives-hdd/

https://blog.westerndigital.com/wd-red-nas-drives/

https://www.seagate.com/internal-hard-drives/cmr-smr-list/

 

As far as I know, all Hitachi drives are CMR.

 

Thanks for sharing, and it seems you are right. I've read some confusing information back and forth and to be honest was never really sure whether mine were SMR or CMR. Doesn't help that WD introduced the Plus-series and then claims all plain Red (1-6TB) are SMR. Good to know they're CMR and makes sense since I haven't noticed performance issues with them.

Link to post
Share on other sites

On 3/26/2021 at 5:05 PM, ShadowDance said:

My Helios64 PSU is not connected to a grounded wall-outlet. Unfortunately here in my apartment there are only two such outlets in very inconvenient places, but I'll see if it's feasible to test it out on that one. Perhaps this introduces noise into the system? Perhaps other devices are leaking their noise via grounded extension cord (I recently learned that using a grounded extension cord on a non-grounded outlet is not ideal, hadn't even thought to consider it before...

 

That is a valid point you are bringing and actually we did request to some customers to pay attention to this aspect. FYI we decided to make the DC ground connected to AC earth when ordered the AC/DC adapter, which should be the safe and standard approach, but we know many manufacturer don't bother doing it. However if you plug your device on a power extension along other devices you might get impact by the noise of other not well isolated devices. You could try to isolated Helios64 by using for example a travel adapter that would not used the earth pin.

 

On 3/26/2021 at 5:05 PM, ShadowDance said:

I read somewhere that the metal frame of the HDD cases should be connected to device ground, but since we're using plastic mount brackets I doubt this is the case? To be honest I don't know if this is a real issue or not, just something I read on a forum.

Hmmm my expectation would be that the HDD enclosure is connected to the ground of the HDD controller PCB, which in turn is connected via power cable to the ground of the motherboard, which in turn is connected to the ground / eather of the AC / DC PSU.

 

Link to post
Share on other sites

Finally got around to testing my Helios64 on the grounded wall outlet (alone), unfortunately there was no change. The disks behaved the same as before and I also had those kernel panics during boot that I seem to get ~3 out of 5 bootups.

 

On 3/29/2021 at 5:58 AM, gprovost said:

Hmmm my expectation would be that the HDD enclosure is connected to the ground of the HDD controller PCB, which in turn is connected via power cable to the ground of the motherboard, which in turn is connected to the ground / eather of the AC / DC PSU.

 

Yes sounds very reasonable, and this is my expectation too. I didn't put too much weight behind what I read, but over there one user had issues with his disks until he grounded the cases of the HDDs. Sounded strange but at this point I'm open to pretty much any strange solutions, haha. Could've just been poorly designed/manufactured HDDs too.

Link to post
Share on other sites

To add another data point, I have 5 of these drives in a raid6 configuration with an ext4 filesystem:

Model Family:     Western Digital Caviar Black
Device Model:     WDC WD2002FAEX-007BA0
Serial Number:    WD-WMAY02794530
LU WWN Device Id: 5 0014ee 002d98d8f
Firmware Version: 05.01D05
User Capacity:    2,000,398,934,016 bytes [2.00 TB]

I moved the drives into the helios64 from another server where they were all running without issues.

 

As far as I know, I have not yet experienced a failure that would cause data loss or even temporary loss of access to a disk that could be addressed with a reboot. I noticed errors similar to the below during the initial raid sync. As others have noted on this thread, this indicates data corruption during the transfer, but in my case the retries seem to succeed. However, after enough failures the device is placed in 3.0 Gbps speed, and afterwards I don't see any more errors on that device.

Apr  2 23:00:17 localhost kernel: [ 1070.137017] ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x380100 action 0x6
Apr  2 23:00:17 localhost kernel: [ 1070.137032] ata3.00: irq_stat 0x08000000
Apr  2 23:00:17 localhost kernel: [ 1070.137040] ata3: SError: { UnrecovData 10B8B Dispar BadCRC }
Apr  2 23:00:17 localhost kernel: [ 1070.137049] ata3.00: failed command: READ DMA EXT
Apr  2 23:00:17 localhost kernel: [ 1070.137061] ata3.00: cmd 25/00:00:00:80:2f/00:04:00:00:00/e0 tag 20 dma 524288 in
Apr  2 23:00:17 localhost kernel: [ 1070.137061]          res 50/00:00:ff:87:2f/00:00:00:00:00/e0 Emask 0x10 (ATA bus error)
Apr  2 23:00:17 localhost kernel: [ 1070.137066] ata3.00: status: { DRDY }
Apr  2 23:00:17 localhost kernel: [ 1070.137079] ata3: hard resetting link
Apr  2 23:00:18 localhost kernel: [ 1070.612946] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Apr  2 23:00:18 localhost kernel: [ 1070.622339] ata3.00: configured for UDMA/133
Apr  2 23:00:18 localhost kernel: [ 1070.622453] ata3: EH complete
Apr  2 23:00:55 localhost kernel: [ 1108.163913] ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x380100 action 0x6
Apr  2 23:00:55 localhost kernel: [ 1108.163927] ata3.00: irq_stat 0x08000000
Apr  2 23:00:55 localhost kernel: [ 1108.163935] ata3: SError: { UnrecovData 10B8B Dispar BadCRC }
Apr  2 23:00:55 localhost kernel: [ 1108.163944] ata3.00: failed command: READ DMA EXT
Apr  2 23:00:55 localhost kernel: [ 1108.163957] ata3.00: cmd 25/00:00:00:cc:2e/00:04:00:00:00/e0 tag 31 dma 524288 in
Apr  2 23:00:55 localhost kernel: [ 1108.163957]          res 50/00:00:4f:46:2f/00:00:00:00:00/e0 Emask 0x10 (ATA bus error)
Apr  2 23:00:55 localhost kernel: [ 1108.163962] ata3.00: status: { DRDY }
Apr  2 23:00:55 localhost kernel: [ 1108.163975] ata3: hard resetting link
Apr  2 23:00:56 localhost kernel: [ 1108.639872] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Apr  2 23:00:56 localhost kernel: [ 1108.649317] ata3.00: configured for UDMA/133
Apr  2 23:00:56 localhost kernel: [ 1108.649434] ata3: EH complete
Apr  2 23:01:27 localhost kernel: [ 1139.690980] ata3: limiting SATA link speed to 3.0 Gbps
Apr  2 23:01:27 localhost kernel: [ 1139.690997] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x980000 action 0x6 frozen
Apr  2 23:01:27 localhost kernel: [ 1139.691004] ata3: SError: { 10B8B Dispar LinkSeq }
Apr  2 23:01:27 localhost kernel: [ 1139.691014] ata3.00: failed command: WRITE DMA EXT
Apr  2 23:01:27 localhost kernel: [ 1139.691026] ata3.00: cmd 35/00:40:80:b2:2f/00:05:00:00:00/e0 tag 16 dma 688128 out
Apr  2 23:01:27 localhost kernel: [ 1139.691026]          res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  2 23:01:27 localhost kernel: [ 1139.691032] ata3.00: status: { DRDY }
Apr  2 23:01:27 localhost kernel: [ 1139.691045] ata3: hard resetting link
Apr  2 23:01:27 localhost kernel: [ 1140.166952] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Apr  2 23:01:27 localhost kernel: [ 1140.177164] ata3.00: configured for UDMA/133
Apr  2 23:01:27 localhost kernel: [ 1140.177269] ata3: EH complete

 

I'm able to reproduce similar errors on one of the drives within a few minutes by running stress (with the current directory on the raid6 filesystem)

stress --hdd 4 &

 

Within a few hours, I will see many similar errors on all 5 drives, and all will be reset to 3.0Gbps.

 

I think it's reasonable to conclude that these errors occur in transit on the cable, because the drives' SMART stats do not seem to show a corresponding increase when the kernel log is full of BadCRC errors. I was checking UDMA_CRC_Error_Count specifically since it seemed most relevant to the errors above, but I'm not very familiar with SMART so there may be something else that I should check.

 

I tried some modifications to /boot/armbianEnv.txt to see how they would affect the problem:

1. extraargs=libata.force=noncq didn't have an impact, I saw similar errors on multiple disks within 30 minutes

2. extraargs=libata.force=noncq,3.0 saw no errors after a 2 hours of testing 

3. extraargs=libata.force=3.0 saw no errors after 2 hours of testing

 

I'd prefer to err on the side of reliability, so I'm going to stick with the extraargs=libata.force=3.0 for now.

 

I wonder how many people are seeing behavior similar to mine that simply haven't noticed these errors in their kernel logs?

 

I had similar errors with a banana pi board a few years ago when I used an internal SATA cable to connect to an external drive enclosure. In that case, switching to an eSATA cable which had the appropriate shielding resolved the issue.

 

I could try replacing the SATA cables to see if it reduces the error rate. Did anyone have recommendations on a specific set of replacement cables that has worked well?

Link to post
Share on other sites

@Fred Fettinger Your errors are most likely due too a not stable HDD harness. You can contact us to support@kobol.io to see if replacement of the HDD wire harness is needed.

 

@ShadowDance Thanks for the feedback and at least we can remove grounding issue from the list of possible root causes.

Looks like that in the use case of very heavy HDD I/O load arising mainly during scrubbing operation too much noise is generated resulting in those HSM violation.

Thanks to all your tests, the issue seem unfortunately to point to noise filtering issue. As previously mentioned, we will fix that in next revision.

We will see if the new SATA controller firmware has any impact but we doubt that. I think the only solution for current revision is to limit ATA speed to 3Gbps when using btrfs or zfs.

Link to post
Share on other sites

Hi all, you could download the SATA firmware update at https://cdn-kobol-io.s3-ap-southeast-1.amazonaws.com/Helios64/SATA_FW/Helios64_SATA_FW_update_00021_200624.img.xz

 

Instruction:

1. Download the sd card image

2. Flash the image into a microSD card

3. Insert microSD card into Helios64 and power on. Helios64 should automatically boot from microSD. If Helios64 still boot from eMMC, disable the eMMC 

4. Wait for a while, the system will do a reboot and then power off if firmware flashing succeed.

   If failed, both System Status and System Fault LEDs will blink

5. Remove the microSD card and boot Helios64 normally. See if there is any improvement.

 

Our officially supported stock firmware can be downloaded from https://cdn-kobol-io.s3-ap-southeast-1.amazonaws.com/Helios64/SATA_FW/Helios64_SATA_FW_factory_00020_190702.img.xz. If there is no improvement on newer firmware, please revert back to this stock firmware.

 

SHA256SUM:

e5dfbe84f4709a3e2138ffb620f0ee62ecbcc79a8f83692c1c1d7a4361f0d30f *Helios64_SATA_FW_factory_00020_190702.img.xz
0d78fec569dd699fd667acf59ba7b07c420a2865e1bcb8b85b26b61d404998c5 *Helios64_SATA_FW_update_00021_200624.img.xz

 

Link to post
Share on other sites

@aprayoga I have updated to the new firmware, but the problem is still there. Shortly after starting a zfs scrub I get the HSM violations.

 

[  544.719820] ata3.00: exception Emask 0x2 SAct 0x30000030 SErr 0x400 action 0x6
[  544.719830] ata3.00: irq_stat 0x08000000
[  544.719836] ata3: SError: { Proto }
[  544.719845] ata3.00: failed command: READ FPDMA QUEUED
[  544.719856] ata3.00: cmd 60/80:20:d0:ec:63/00:00:0d:00:00/40 tag 4 ncq dma 65536 in
                        res 40/00:e0:d0:eb:63/00:00:0d:00:00/40 Emask 0x2 (HSM violation)
[  544.719860] ata3.00: status: { DRDY }

 

Is there a way to check the installed firmware version to verify that the update was really successful?

Link to post
Share on other sites

Thanks for the updated firmware. Unfortunately, I'm seeing the same as Wofferl:

 

Spoiler


[  156.746403] ata4.00: exception Emask 0x10 SAct 0x4 SErr 0x300100 action 0x6
[  156.747037] ata4.00: irq_stat 0x08000000
[  156.747396] ata4: SError: { UnrecovData Dispar BadCRC }
[  156.747869] ata4.00: failed command: READ FPDMA QUEUED
[  156.748346] ata4.00: cmd 60/00:10:e0:d5:54/01:00:00:00:00/40 tag 2 ncq dma 131072 in
[  156.748346]          res 40/00:10:e0:d5:54/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[  156.749723] ata4.00: status: { DRDY }
[  156.750075] ata4: hard resetting link
[  157.226341] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  157.228668] ata4.00: configured for UDMA/133
[  157.229175] sd 3:0:0:0: [sdd] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[  157.229993] sd 3:0:0:0: [sdd] tag#2 Sense Key : 0x5 [current]
[  157.230552] sd 3:0:0:0: [sdd] tag#2 ASC=0x21 ASCQ=0x4
[  157.231026] sd 3:0:0:0: [sdd] tag#2 CDB: opcode=0x88 88 00 00 00 00 00 00 54 d5 e0 00 00 01 00 00 00
[  157.231843] blk_update_request: I/O error, dev sdd, sector 5559776 op 0x0:(READ) flags 0x80700 phys_seg 12 prio class 0
[  157.232994] ata4: EH complete
[  193.546385] ata4.00: exception Emask 0x10 SAct 0xc00 SErr 0x300100 action 0x6 frozen
[  193.547085] ata4.00: irq_stat 0x08000000
[  193.547446] ata4: SError: { UnrecovData Dispar BadCRC }
[  193.547919] ata4.00: failed command: READ FPDMA QUEUED
[  193.548398] ata4.00: cmd 60/00:50:e0:c5:6d/01:00:00:00:00/40 tag 10 ncq dma 131072 in
[  193.548398]          res 40/00:50:e0:c5:6d/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[  193.549784] ata4.00: status: { DRDY }
[  193.550121] ata4.00: failed command: READ FPDMA QUEUED
[  193.550703] ata4.00: cmd 60/00:58:e0:c6:6d/01:00:00:00:00/40 tag 11 ncq dma 131072 in
[  193.550703]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x14 (ATA bus error)
[  193.552096] ata4.00: status: { DRDY }
[  193.552446] ata4: hard resetting link
[  194.030341] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  194.032750] ata4.00: configured for UDMA/133
[  194.033243] ata4.00: device reported invalid CHS sector 0
[  194.033787] sd 3:0:0:0: [sdd] tag#10 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=31s
[  194.034671] sd 3:0:0:0: [sdd] tag#10 Sense Key : 0x5 [current]
[  194.035209] sd 3:0:0:0: [sdd] tag#10 ASC=0x21 ASCQ=0x4
[  194.035688] sd 3:0:0:0: [sdd] tag#10 CDB: opcode=0x88 88 00 00 00 00 00 00 6d c5 e0 00 00 01 00 00 00
[  194.036516] blk_update_request: I/O error, dev sdd, sector 7194080 op 0x0:(READ) flags 0x80700 phys_seg 8 prio class 0
[  194.037704] sd 3:0:0:0: [sdd] tag#11 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=31s
[  194.038573] sd 3:0:0:0: [sdd] tag#11 Sense Key : 0x5 [current]
[  194.039111] sd 3:0:0:0: [sdd] tag#11 ASC=0x21 ASCQ=0x4
[  194.039590] sd 3:0:0:0: [sdd] tag#11 CDB: opcode=0x88 88 00 00 00 00 00 00 6d c6 e0 00 00 01 00 00 00
[  194.040414] blk_update_request: I/O error, dev sdd, sector 7194336 op 0x0:(READ) flags 0x80700 phys_seg 32 prio class 0
[  194.041545] ata4: EH complete
[  194.286419] ata4.00: exception Emask 0x2 SAct 0x7c000000 SErr 0x400 action 0x6
[  194.287073] ata4.00: irq_stat 0x08000000
[  194.287432] ata4: SError: { Proto }
[  194.287754] ata4.00: failed command: READ FPDMA QUEUED
[  194.288230] ata4.00: cmd 60/01:d0:3b:c6:6d/00:00:00:00:00/40 tag 26 ncq dma 512 in
[  194.288230]          res 40/00:e8:3e:c6:6d/00:00:00:00:00/40 Emask 0x2 (HSM violation)
[  194.289585] ata4.00: status: { DRDY }
[  194.289920] ata4.00: failed command: READ FPDMA QUEUED
[  194.290429] ata4.00: cmd 60/01:d8:3c:c6:6d/00:00:00:00:00/40 tag 27 ncq dma 512 in
[  194.290429]          res 40/00:e8:3e:c6:6d/00:00:00:00:00/40 Emask 0x2 (HSM violation)
[  194.291787] ata4.00: status: { DRDY }
[  194.292125] ata4.00: failed command: READ FPDMA QUEUED
[  194.292600] ata4.00: cmd 60/01:e0:3d:c6:6d/00:00:00:00:00/40 tag 28 ncq dma 512 in
[  194.292600]          res 40/00:e8:3e:c6:6d/00:00:00:00:00/40 Emask 0x2 (HSM violation)
[  194.293954] ata4.00: status: { DRDY }
[  194.294397] ata4.00: failed command: READ FPDMA QUEUED
[  194.294881] ata4.00: cmd 60/01:e8:3e:c6:6d/00:00:00:00:00/40 tag 29 ncq dma 512 in
[  194.294881]          res 40/00:e8:3e:c6:6d/00:00:00:00:00/40 Emask 0x2 (HSM violation)
[  194.296233] ata4.00: status: { DRDY }
[  194.296568] ata4.00: failed command: READ FPDMA QUEUED
[  194.297043] ata4.00: cmd 60/01:f0:3f:c6:6d/00:00:00:00:00/40 tag 30 ncq dma 512 in
[  194.297043]          res 40/00:e8:3e:c6:6d/00:00:00:00:00/40 Emask 0x2 (HSM violation)
[  194.298468] ata4.00: status: { DRDY }
[  194.298823] ata4: hard resetting link
[  194.774377] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  194.776810] ata4.00: configured for UDMA/133
[  194.777458] ata4: EH complete
[  195.010433] ata4: limiting SATA link speed to 3.0 Gbps
[  195.010919] ata4.00: exception Emask 0x2 SAct 0x3f000 SErr 0x400 action 0x6
[  195.011536] ata4.00: irq_stat 0x08000000
[  195.011894] ata4: SError: { Proto }
[  195.012216] ata4.00: failed command: READ FPDMA QUEUED
[  195.012693] ata4.00: cmd 60/01:60:aa:c6:6d/00:00:00:00:00/40 tag 12 ncq dma 512 in
[  195.012693]          res 40/00:78:ad:c6:6d/00:00:00:00:00/40 Emask 0x2 (HSM violation)
[  195.014048] ata4.00: status: { DRDY }
[  195.014422] ata4.00: failed command: READ FPDMA QUEUED
[  195.014900] ata4.00: cmd 60/01:68:ab:c6:6d/00:00:00:00:00/40 tag 13 ncq dma 512 in
[  195.014900]          res 40/00:78:ad:c6:6d/00:00:00:00:00/40 Emask 0x2 (HSM violation)
[  195.016253] ata4.00: status: { DRDY }
[  195.016587] ata4.00: failed command: READ FPDMA QUEUED
[  195.017063] ata4.00: cmd 60/01:70:ac:c6:6d/00:00:00:00:00/40 tag 14 ncq dma 512 in
[  195.017063]          res 40/00:78:ad:c6:6d/00:00:00:00:00/40 Emask 0x2 (HSM violation)
[  195.018440] ata4.00: status: { DRDY }
[  195.018778] ata4.00: failed command: READ FPDMA QUEUED
[  195.019254] ata4.00: cmd 60/01:78:ad:c6:6d/00:00:00:00:00/40 tag 15 ncq dma 512 in
[  195.019254]          res 40/00:78:ad:c6:6d/00:00:00:00:00/40 Emask 0x2 (HSM violation)
[  195.020607] ata4.00: status: { DRDY }
[  195.020942] ata4.00: failed command: READ FPDMA QUEUED
[  195.021418] ata4.00: cmd 60/01:80:ae:c6:6d/00:00:00:00:00/40 tag 16 ncq dma 512 in
[  195.021418]          res 40/00:78:ad:c6:6d/00:00:00:00:00/40 Emask 0x2 (HSM violation)
[  195.022900] ata4.00: status: { DRDY }
[  195.023240] ata4.00: failed command: READ FPDMA QUEUED
[  195.023715] ata4.00: cmd 60/01:88:af:c6:6d/00:00:00:00:00/40 tag 17 ncq dma 512 in
[  195.023715]          res 40/00:78:ad:c6:6d/00:00:00:00:00/40 Emask 0x2 (HSM violation)
[  195.025069] ata4.00: status: { DRDY }
[  195.025420] ata4: hard resetting link
[  195.506345] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  195.508591] ata4.00: configured for UDMA/133
[  195.509252] ata4: EH complete

 

 

At first the programming didn't seem to work (I used balenaEtcher without unpacking the file, it seemed to understand that it was compressed though so went ahead with it), there was no reboot either. Here's the output:

 

Spoiler


U-Boot 2020.07-armbian (Aug 19 2020 - 11:56:02 +0700)

SoC: Rockchip rk3399
Reset cause: POR
DRAM:  3.9 GiB
PMIC:  RK808 
SF: Detected w25q128 with page size 256 Bytes, erase size 4 KiB, total 16 MiB
MMC:   mmc@fe320000: 1, sdhci@fe330000: 0
Loading Environment from MMC... unable to select a mode
*** Warning - No block device, using default environment

In:    serial
Out:   serial
Err:   serial
Model: Helios64
Revision: 1.2 - 4GB non ECC
Net:   eth0: ethernet@fe300000
Hit any key to stop autoboot:  0 
unable to select a mode
unable to select a mode
starting USB...
Bus usb@fe380000: USB EHCI 1.00
Bus dwc3: usb maximum-speed not found
Register 2000140 NbrPorts 2
Starting the controller
USB XHCI 1.10
scanning bus usb@fe380000 for devices... 1 USB Device(s) found
scanning bus dwc3 for devices... 4 USB Device(s) found
       scanning usb for storage devices... 0 Storage Device(s) found

Device 0: unknown device
scanning bus for devices...

Device 0: unknown device
Speed: 1000, full duplex
BOOTP broadcast 1
BOOTP broadcast 2
BOOTP broadcast 3
BOOTP broadcast 4
DHCP client bound to address 192.168.1.106 (1932 ms)
*** ERROR: `serverip' not set
Cannot autoload with TFTPGET
missing environment variable: pxeuuid
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/01-64-62-66-d0-0b-12
Speed: 1000, full duplex
*** ERROR: `serverip' not set
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/C0A8016A
Speed: 1000, full duplex
*** ERROR: `serverip' not set
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/C0A8016
Speed: 1000, full duplex
*** ERROR: `serverip' not set
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/C0A801
Speed: 1000, full duplex
*** ERROR: `serverip' not set
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/C0A80
Speed: 1000, full duplex
*** ERROR: `serverip' not set
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/C0A8
Speed: 1000, full duplex
*** ERROR: `serverip' not set
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/C0A
Speed: 1000, full duplex
*** ERROR: `serverip' not set
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/C0
Speed: 1000, full duplex
*** ERROR: `serverip' not set
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/C
Speed: 1000, full duplex
*** ERROR: `serverip' not set
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/default-arm-rk3399-helios64
Speed: 1000, full duplex
*** ERROR: `serverip' not set
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/default-arm-rk3399
Speed: 1000, full duplex
*** ERROR: `serverip' not set
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/default-arm
Speed: 1000, full duplex
*** ERROR: `serverip' not set
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/default
Speed: 1000, full duplex
*** ERROR: `serverip' not set
Config file not found
Speed: 1000, full duplex
BOOTP broadcast 1
DHCP client bound to address 192.168.1.106 (53 ms)
*** ERROR: `serverip' not set
Cannot autoload with TFTPGET
Speed: 1000, full duplex
BOOTP broadcast 1
DHCP client bound to address 192.168.1.106 (23 ms)
*** ERROR: `serverip' not set
Cannot autoload with TFTPGET
=> 

 

 

Then I tried flashing it again, unpacked it first this time around and the flashing worked as described:

 

Spoiler


DDR Version 1.24 20191016
In
soft reset
SRX
channel 0
CS = 0
MR0=0x18
MR4=0x1
MR5=0x1
MR8=0x10
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
channel 1
CS = 0
MR0=0x18
MR4=0x1
MR5=0x1
MR8=0x10
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
channel 0 training pass!
channel 1 training pass!
change freq to 416MHz 0,1
Channel 0: LPDDR4,416MHz
Bus Width=32 Col=10 Bank=8 Row=16 CS=1 Die Bus-Width=16 Size=2048MB
Channel 1: LPDDR4,416MHz
Bus Width=32 Col=10 Bank=8 Row=16 CS=1 Die Bus-Width=16 Size=2048MB
256B stride
channel 0
CS = 0
MR0=0x18
MR4=0x82
MR5=0x1
MR8=0x10
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
channel 1
CS = 0
MR0=0x18
MR4=0x1
MR5=0x1
MR8=0x10
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
channel 0 training pass!
channel 1 training pass!
channel 0, cs 0, advanced training done
channel 1, cs 0, advanced training done
change freq to 856MHz 1,0
ch 0 ddrconfig = 0x101, ddrsize = 0x40
ch 1 ddrconfig = 0x101, ddrsize = 0x40
pmugrf_os_reg[2] = 0x32C1F2C1, stride = 0xD
ddr_set_rate to 328MHZ
ddr_set_rate to 666MHZ
ddr_set_rate to 928MHZ
channel 0, cs 0, advanced training done
channel 1, cs 0, advanced training done
ddr_set_rate to 416MHZ, ctl_index 0
ddr_set_rate to 856MHZ, ctl_index 1
support 416 856 328 666 928 MHz, current 856MHz
OUT
Boot1: 2019-03-14, version: 1.19
CPUId = 0x0
ChipType = 0x10, 322
SdmmcInit=2 0
BootCapSize=100000
UserCapSize=14910MB
FwPartOffset=2000 , 100000
mmc0:cmd5,20
SdmmcInit=0 0
BootCapSize=0
UserCapSize=15247MB
FwPartOffset=2000 , 0
StorageInit ok = 196695
SecureMode = 0
SecureInit read PBA: 0x4
SecureInit read PBA: 0x404
SecureInit read PBA: 0x804
SecureInit read PBA: 0xc04
SecureInit read PBA: 0x1004
SecureInit read PBA: 0x1404
SecureInit read PBA: 0x1804
SecureInit read PBA: 0x1c04
SecureInit ret = 0, SecureMode = 0
atags_set_bootdev: ret:(0)
GPT 0x3380ec0 signature is wrong
recovery gpt...
GPT 0x3380ec0 signature is wrong
recovery gpt fail!
LoadTrust Addr:0x4000
No find bl30.bin
No find bl32.bin
Load uboot, ReadLba = 2000
Load OK, addr=0x200000, size=0xdedec
RunBL31 0x40000
NOTICE:  BL31: v1.3(debug):42583b6
NOTICE:  BL31: Built : 07:55:13, Oct 15 2019
NOTICE:  BL31: Rockchip release version: v1.1
INFO:    GICv3 with legacy support detected. ARM GICV3 driver initialized in EL3
INFO:    Using opteed sec cpu_context!
INFO:    boot cpu mask: 0
INFO:    plat_rockchip_pmu_init(1190): pd status 3e
INFO:    BL31: Initializing runtime services
WARNING: No OPTEE provided by BL2 boot loader, Booting device without OPTEE initialization. SMC`s destined for OPTEE will return SMC_UNK
ERROR:   Error initializing runtime service opteed_fast
INFO:    BL31: Preparing for EL3 exit to normal world
INFO:    Entry point address = 0x200000
INFO:    SPSR = 0x3c9


U-Boot 2020.07-armbian (Aug 19 2020 - 11:56:02 +0700)

SoC: Rockchip rk3399
Reset cause: RST
DRAM:  3.9 GiB
PMIC:  RK808 
SF: Detected w25q128 with page size 256 Bytes, erase size 4 KiB, total 16 MiB
MMC:   mmc@fe320000: 1, sdhci@fe330000: 0
Loading Environment from MMC... *** Warning - !read failed, using default environment

In:    serial
Out:   serial
Err:   serial
Model: Helios64
Revision: 1.2 - 4GB non ECC
Net:   eth0: ethernet@fe300000
Hit any key to stop autoboot:  0 
switch to partitions #0, OK
mmc1 is current device
Scanning mmc 1:1...
Found U-Boot script /boot/boot.scr
876 bytes read in 6 ms (142.6 KiB/s)
## Executing script at 00500000
gpio: pin E24 (gpio 168) value is 1
SF: Detected w25x20cl with page size 256 Bytes, erase size 4 KiB, total 256 KiB
65568 bytes read in 10 ms (6.3 MiB/s)
device 0 offset 0x0, size 0x10020
SF: 65568 bytes @ 0x0 Read: OK
byte at 0x0000000002080060 (0x78) != byte at 0x0000000006000060 (0x24)
Total of 96 byte(s) were the same
Writing SATA Flash
SF: 262144 bytes @ 0x0 Erased: OK
device 0 offset 0x0, size 0x10020
   
Updating, 7% 38836 B/s   
Updating, 13% 39016 B/s   
Updating, 19% 39321 B/s   
Updating, 25% 39199 B/s   
Updating, 32% 39199 B/s   
Updating, 38% 39321 B/s   
Updating, 44% 39409 B/s   
Updating, 50% 39522 B/s   
Updating, 57% 39527 B/s   
Updating, 63% 39531 B/s   
Updating, 69% 39534 B/s   
Updating, 75% 39568 B/s   
Updating, 82% 39511 B/s   
Updating, 88% 39542 B/s   
Updating, 94% 39568 B/s   
Updating, 100% 39522 B/s
65568 bytes written, 0 bytes skipped in 1.802s, speed 37259 B/s
gpio: pin E24 (gpio 168) value is 0
gpio: pin C28 (gpio 108) value is 0
resetting ...
DDR Version 1.24 20191016
In
soft reset
SRX
channel 0
CS = 0
MR0=0x18
MR4=0x1
MR5=0x1
MR8=0x10
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
channel 1
CS = 0
MR0=0x18
MR4=0x1
MR5=0x1
MR8=0x10
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
channel 0 training pass!
channel 1 training pass!
change freq to 416MHz 0,1
Channel 0: LPDDR4,416MHz
Bus Width=32 Col=10 Bank=8 Row=16 CS=1 Die Bus-Width=16 Size=2048MB
Channel 1: LPDDR4,416MHz
Bus Width=32 Col=10 Bank=8 Row=16 CS=1 Die Bus-Width=16 Size=2048MB
256B stride
channel 0
CS = 0
MR0=0x18
MR4=0x1
MR5=0x1
MR8=0x10
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
channel 1
CS = 0
MR0=0x18
MR4=0x1
MR5=0x1
MR8=0x10
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
channel 0 training pass!
channel 1 training pass!
channel 0, cs 0, advanced training done
channel 1, cs 0, advanced training done
change freq to 856MHz 1,0
ch 0 ddrconfig = 0x101, ddrsize = 0x40
ch 1 ddrconfig = 0x101, ddrsize = 0x40
pmugrf_os_reg[2] = 0x32C1F2C1, stride = 0xD
ddr_set_rate to 328MHZ
ddr_set_rate to 666MHZ
ddr_set_rate to 928MHZ
channel 0, cs 0, advanced training done
channel 1, cs 0, advanced training done
ddr_set_rate to 416MHZ, ctl_index 0
ddr_set_rate to 856MHZ, ctl_index 1
support 416 856 328 666 928 MHz, current 856MHz
OUT
Boot1: 2019-03-14, version: 1.19
CPUId = 0x0
ChipType = 0x10, 323
SdmmcInit=2 0
BootCapSize=100000
UserCapSize=14910MB
FwPartOffset=2000 , 100000
mmc0:cmd5,20
SdmmcInit=0 0
BootCapSize=0
UserCapSize=15247MB
FwPartOffset=2000 , 0
StorageInit ok = 66832
SecureMode = 0
SecureInit read PBA: 0x4
SecureInit read PBA: 0x404
SecureInit read PBA: 0x804
SecureInit read PBA: 0xc04
SecureInit read PBA: 0x1004
SecureInit read PBA: 0x1404
SecureInit read PBA: 0x1804
SecureInit read PBA: 0x1c04
SecureInit ret = 0, SecureMode = 0
atags_set_bootdev: ret:(0)
GPT 0x3380ec0 signature is wrong
recovery gpt...
GPT 0x3380ec0 signature is wrong
recovery gpt fail!
LoadTrust Addr:0x4000
No find bl30.bin
No find bl32.bin
Load uboot, ReadLba = 2000
Load OK, addr=0x200000, size=0xdedec
RunBL31 0x40000
NOTICE:  BL31: v1.3(debug):42583b6
NOTICE:  BL31: Built : 07:55:13, Oct 15 2019
NOTICE:  BL31: Rockchip release version: v1.1
INFO:    GICv3 with legacy support detected. ARM GICV3 driver initialized in EL3
INFO:    Using opteed sec cpu_context!
INFO:    boot cpu mask: 0
INFO:    plat_rockchip_pmu_init(1190): pd status 3e
INFO:    BL31: Initializing runtime services
WARNING: No OPTEE provided by BL2 boot loader, Booting device without OPTEE initialization. SMC`s destined for OPTEE will return SMC_UNK
ERROR:   Error initializing runtime service opteed_fast
INFO:    BL31: Preparing for EL3 exit to normal world
INFO:    Entry point address = 0x200000
INFO:    SPSR = 0x3c9


U-Boot 2020.07-armbian (Aug 19 2020 - 11:56:02 +0700)

SoC: Rockchip rk3399
Reset cause: RST
DRAM:  3.9 GiB
PMIC:  RK808 
SF: Detected w25q128 with page size 256 Bytes, erase size 4 KiB, total 16 MiB
MMC:   mmc@fe320000: 1, sdhci@fe330000: 0
Loading Environment from MMC... unable to select a mode
*** Warning - No block device, using default environment

In:    serial
Out:   serial
Err:   serial
Model: Helios64
Revision: 1.2 - 4GB non ECC
Net:   eth0: ethernet@fe300000
Hit any key to stop autoboot:  0 
switch to partitions #0, OK
mmc1 is current device
Scanning mmc 1:1...
Found U-Boot script /boot/boot.scr
876 bytes read in 6 ms (142.6 KiB/s)
## Executing script at 00500000
gpio: pin E24 (gpio 168) value is 1
SF: Detected w25x20cl with page size 256 Bytes, erase size 4 KiB, total 256 KiB
65568 bytes read in 10 ms (6.3 MiB/s)
device 0 offset 0x0, size 0x10020
SF: 65568 bytes @ 0x0 Read: OK
Total of 65568 byte(s) were the same
Continue boot
gpio: pin E24 (gpio 168) value is 0
Power off system
Setting bus to 0

 

 

Link to post
Share on other sites

@ShadowDance

 

Someone from our helpdesk mentioned today, they encounter a warning message from Synology DSM after a drive failure remplacement. The NAS was loaded with EFRX series. Here is what you can find in Synology compatibility list now. The 4TB has even a lower number recommended.

Quote

It is highly recommended to not configure RAID 5 or RAID 6 with 7 or more drives due to the possibility of a performance drop.

 

I wish I had a better

 

Oli

 

Link to post
Share on other sites

I am having the same issue: though I wanna mention that I use Seagate drives. So the issue is not restricted to WD. I can reliably get it using a zfs scrub. Currently 4 disks are in use and it always happens on /dev/sdd, so disk 4. Disk 5 is the empty slot.
I have read most of the comments but did anyone really succeed with a new power harness or the new firmware?

[16218.668509] ata4.00: exception Emask 0x10 SAct 0x1000084 SErr 0xb80100 action 0x6 frozen
[16218.668523] ata4.00: irq_stat 0x08000000
[16218.668530] ata4: SError: { UnrecovData 10B8B Dispar BadCRC LinkSeq }
[16218.668539] ata4.00: failed command: READ FPDMA QUEUED
[16218.668551] ata4.00: cmd 60/b8:10:e0:43:37/07:00:11:03:00/40 tag 2 ncq dma 1011712 in
                        res 40/00:38:98:4b:37/00:00:11:03:00/40 Emask 0x10 (ATA bus error)
[16218.668556] ata4.00: status: { DRDY }
[16218.668562] ata4.00: failed command: READ FPDMA QUEUED
[16218.668574] ata4.00: cmd 60/b8:38:98:4b:37/07:00:11:03:00/40 tag 7 ncq dma 1011712 in
                        res 40/00:38:98:4b:37/00:00:11:03:00/40 Emask 0x10 (ATA bus error)
[16218.668578] ata4.00: status: { DRDY }
[16218.668584] ata4.00: failed command: READ FPDMA QUEUED
[16218.668595] ata4.00: cmd 60/b8:c0:58:53:37/07:00:11:03:00/40 tag 24 ncq dma 1011712 in
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x14 (ATA bus error)
[16218.668600] ata4.00: status: { DRDY }
[16218.668612] ata4: hard resetting link
[16219.144463] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[16219.163619] ata4.00: configured for UDMA/133
[16219.164082] ata4.00: device reported invalid CHS sector 0
[16219.164110] sd 3:0:0:0: [sdd] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=31s
[16219.164116] sd 3:0:0:0: [sdd] tag#2 Sense Key : 0x5 [current] 
[16219.164121] sd 3:0:0:0: [sdd] tag#2 ASC=0x21 ASCQ=0x4 
[16219.164127] sd 3:0:0:0: [sdd] tag#2 CDB: opcode=0x88 88 00 00 00 00 03 11 37 43 e0 00 00 07 b8 00 00
[16219.164133] blk_update_request: I/O error, dev sdd, sector 13173736416 op 0x0:(READ) flags 0x700 phys_seg 125 prio class 0
[16219.164148] zio pool=r2pool vdev=/dev/disk/by-id/ata-ST10000VN0004-1ZD101_$serial-part1 error=5 type=1 offset=6744951996416 size=1011712 flags=40080cb0
[16219.164209] sd 3:0:0:0: [sdd] tag#7 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=31s
[16219.164213] sd 3:0:0:0: [sdd] tag#7 Sense Key : 0x5 [current] 
[16219.164218] sd 3:0:0:0: [sdd] tag#7 ASC=0x21 ASCQ=0x4 
[16219.164222] sd 3:0:0:0: [sdd] tag#7 CDB: opcode=0x88 88 00 00 00 00 03 11 37 4b 98 00 00 07 b8 00 00
[16219.164227] blk_update_request: I/O error, dev sdd, sector 13173738392 op 0x0:(READ) flags 0x700 phys_seg 142 prio class 0
[16219.164235] zio pool=r2pool vdev=/dev/disk/by-id/ata-ST10000VN0004-1ZD101_$serial-part1 error=5 type=1 offset=6744953008128 size=1011712 flags=40080cb0
[16219.164286] sd 3:0:0:0: [sdd] tag#24 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=31s
[16219.164291] sd 3:0:0:0: [sdd] tag#24 Sense Key : 0x5 [current] 
[16219.164295] sd 3:0:0:0: [sdd] tag#24 ASC=0x21 ASCQ=0x4 
[16219.164299] sd 3:0:0:0: [sdd] tag#24 CDB: opcode=0x88 88 00 00 00 00 03 11 37 53 58 00 00 07 b8 00 00
[16219.164304] blk_update_request: I/O error, dev sdd, sector 13173740376 op 0x0:(READ) flags 0x700 phys_seg 127 prio class 0
[16219.164311] zio pool=r2pool vdev=/dev/disk/by-id/ata-ST10000VN0004-1ZD101_$serial-part1 error=5 type=1 offset=6744954023936 size=1011712 flags=40080cb0
[16219.164339] ata4: EH complete

 

Link to post
Share on other sites

...now i got the same ....

5 SDD SanDisk SDSSDH3 2T00

but yet only with one disk. 

 

no zfs but snapraid with daily check and scrub 

 

[Sat May  8 20:16:25 2021] ata4.00: exception Emask 0x10 SAct 0x80000 SErr 0xb80100 action 0x6
[Sat May  8 20:16:25 2021] ata4.00: irq_stat 0x08000000
[Sat May  8 20:16:25 2021] ata4: SError: { UnrecovData 10B8B Dispar BadCRC LinkSeq }
[Sat May  8 20:16:25 2021] ata4.00: failed command: READ FPDMA QUEUED
[Sat May  8 20:16:25 2021] ata4.00: cmd 60/00:98:40:0d:fa/04:00:56:00:00/40 tag 19 ncq dma 524288 in
                                    res 40/00:9c:40:0d:fa/00:00:56:00:00/40 Emask 0x10 (ATA bus error)
[Sat May  8 20:16:25 2021] ata4.00: status: { DRDY }
[Sat May  8 20:16:25 2021] ata4: hard resetting link
[Sat May  8 20:16:25 2021] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[Sat May  8 20:16:25 2021] ata4.00: configured for UDMA/133
[Sat May  8 20:16:25 2021] sd 3:0:0:0: [sdd] tag#19 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[Sat May  8 20:16:25 2021] sd 3:0:0:0: [sdd] tag#19 Sense Key : 0x5 [current]
[Sat May  8 20:16:25 2021] sd 3:0:0:0: [sdd] tag#19 ASC=0x21 ASCQ=0x4
[Sat May  8 20:16:25 2021] sd 3:0:0:0: [sdd] tag#19 CDB: opcode=0x28 28 00 56 fa 0d 40 00 04 00 00
[Sat May  8 20:16:25 2021] blk_update_request: I/O error, dev sdd, sector 1459227968 op 0x0:(READ) flags 0x80700 phys_seg 127 prio class 0
[Sat May  8 20:16:25 2021] ata4: EH complete
 

Link to post
Share on other sites

Sadly, I seem to be having the same problem. I noticed this recently.
I was running a RAID5 with 4 WD HDDs in ext4.
I haven't looked at the logs for a while, but I'm surprised to see that this error seems to occur frequently in my situation.

It seems to occur when frequent reading and writing occurs, perhaps due to virus checking.
Has anyone had any success with this?

 

May 22 00:10:44 helios64 kernel: [84914.565614] ata4: limiting SATA link speed to 3.0 Gbps
May 22 00:10:44 helios64 kernel: [84914.565655] ata4.00: irq_stat 0x08000000
May 22 00:10:44 helios64 kernel: [84914.565672] ata4: SError: { UnrecovData 10B8B Dispar BadCRC LinkSeq }
May 22 00:10:44 helios64 kernel: [84914.565692] ata4.00: failed command: READ FPDMA QUEUED
May 22 00:10:44 helios64 kernel: [84914.565722] ata4.00: cmd 60/00:00:00:94:51/01:00:67:01:00/40 tag 0 ncq dma 131072 in
May 22 00:10:44 helios64 kernel: [84914.565722]          res 40/00:70:e0:09:c4/00:00:f9:01:00/40 Emask 0x14 (ATA bus error)
May 22 00:10:44 helios64 kernel: [84914.565735] ata4.00: status: { DRDY }
May 22 00:10:44 helios64 kernel: [84914.565750] ata4.00: failed command: READ FPDMA QUEUED
May 22 00:10:44 helios64 kernel: [84914.565778] ata4.00: cmd 60/00:08:00:95:51/03:00:67:01:00/40 tag 1 ncq dma 393216 in
May 22 00:10:44 helios64 kernel: [84914.565778]          res 40/00:70:e0:09:c4/00:00:f9:01:00/40 Emask 0x14 (ATA bus error)
May 22 00:10:44 helios64 kernel: [84914.565791] ata4.00: status: { DRDY }
May 22 00:10:44 helios64 kernel: [84914.565806] ata4.00: failed command: READ FPDMA QUEUED
May 22 00:10:44 helios64 kernel: [84914.565834] ata4.00: cmd 60/00:f8:00:90:51/04:00:67:01:00/40 tag 31 ncq dma 524288 in
May 22 00:10:44 helios64 kernel: [84914.565834]          res 40/00:f8:00:90:51/00:00:67:01:00/40 Emask 0x10 (ATA bus error)
May 22 00:10:44 helios64 kernel: [84914.565846] ata4.00: status: { DRDY }
May 22 00:10:44 helios64 kernel: [84914.565866] ata4: hard resetting link
May 22 00:10:45 helios64 kernel: [84915.817574] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
May 22 00:10:45 helios64 kernel: [84915.818847] ata4.00: configured for UDMA/133
May 22 00:10:45 helios64 kernel: [84915.819248] sd 3:0:0:0: [sdc] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=31s
May 22 00:10:45 helios64 kernel: [84915.819270] sd 3:0:0:0: [sdc] tag#0 Sense Key : 0x5 [current] 
May 22 00:10:45 helios64 kernel: [84915.819288] sd 3:0:0:0: [sdc] tag#0 ASC=0x21 ASCQ=0x4 
May 22 00:10:45 helios64 kernel: [84915.819308] sd 3:0:0:0: [sdc] tag#0 CDB: opcode=0x88 88 00 00 00 00 01 67 51 94 00 00 00 01 00 00 00
May 22 00:10:45 helios64 kernel: [84915.819329] blk_update_request: I/O error, dev sdc, sector 6028366848 op 0x0:(READ) flags 0x80700 phys_seg 16 prio class 0
May 22 00:10:45 helios64 kernel: [84915.819458] sd 3:0:0:0: [sdc] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=31s
May 22 00:10:45 helios64 kernel: [84915.819477] sd 3:0:0:0: [sdc] tag#1 Sense Key : 0x5 [current] 
May 22 00:10:45 helios64 kernel: [84915.819494] sd 3:0:0:0: [sdc] tag#1 ASC=0x21 ASCQ=0x4 
May 22 00:10:45 helios64 kernel: [84915.819512] sd 3:0:0:0: [sdc] tag#1 CDB: opcode=0x88 88 00 00 00 00 01 67 51 95 00 00 00 03 00 00 00
May 22 00:10:45 helios64 kernel: [84915.819531] blk_update_request: I/O error, dev sdc, sector 6028367104 op 0x0:(READ) flags 0x80700 phys_seg 28 prio class 0
May 22 00:10:45 helios64 kernel: [84915.819626] sd 3:0:0:0: [sdc] tag#31 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=31s
May 22 00:10:45 helios64 kernel: [84915.819643] sd 3:0:0:0: [sdc] tag#31 Sense Key : 0x5 [current] 
May 22 00:10:45 helios64 kernel: [84915.819660] sd 3:0:0:0: [sdc] tag#31 ASC=0x21 ASCQ=0x4 
May 22 00:10:45 helios64 kernel: [84915.819678] sd 3:0:0:0: [sdc] tag#31 CDB: opcode=0x88 88 00 00 00 00 01 67 51 90 00 00 00 04 00 00 00
May 22 00:10:45 helios64 kernel: [84915.819695] blk_update_request: I/O error, dev sdc, sector 6028365824 op 0x0:(READ) flags 0x80700 phys_seg 30 prio class 0
May 22 00:10:45 helios64 kernel: [84915.819767] ata4: EH complete
May 22 00:14:06 helios64 kernel: [85116.967097] ata5.00: irq_stat 0x08000000
May 22 00:14:06 helios64 kernel: [85116.967101] ata5: SError: { Proto }
May 22 00:14:06 helios64 kernel: [85116.967108] ata5.00: failed command: READ FPDMA QUEUED
May 22 00:14:06 helios64 kernel: [85116.967116] ata5.00: cmd 60/00:c8:00:76:bb/02:00:02:01:00/40 tag 25 ncq dma 262144 in
May 22 00:14:06 helios64 kernel: [85116.967116]          res 40/00:c8:00:76:bb/00:00:02:01:00/40 Emask 0x2 (HSM violation)
May 22 00:14:06 helios64 kernel: [85116.967119] ata5.00: status: { DRDY }
May 22 00:14:06 helios64 kernel: [85116.967123] ata5.00: failed command: READ FPDMA QUEUED
May 22 00:14:06 helios64 kernel: [85116.967129] ata5.00: cmd 60/00:d0:00:7c:bb/02:00:02:01:00/40 tag 26 ncq dma 262144 in
May 22 00:14:06 helios64 kernel: [85116.967129]          res 40/00:c8:00:76:bb/00:00:02:01:00/40 Emask 0x2 (HSM violation)
May 22 00:14:06 helios64 kernel: [85116.967132] ata5.00: status: { DRDY }
May 22 00:14:06 helios64 kernel: [85116.967136] ata5.00: failed command: READ FPDMA QUEUED
May 22 00:14:06 helios64 kernel: [85116.967142] ata5.00: cmd 60/00:d8:00:7e:bb/02:00:02:01:00/40 tag 27 ncq dma 262144 in
May 22 00:14:06 helios64 kernel: [85116.967142]          res 40/00:c8:00:76:bb/00:00:02:01:00/40 Emask 0x2 (HSM violation)
May 22 00:14:06 helios64 kernel: [85116.967145] ata5.00: status: { DRDY }
May 22 00:14:06 helios64 kernel: [85116.967152] ata5: hard resetting link
May 22 00:14:06 helios64 kernel: [85117.443071] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
May 22 00:14:06 helios64 kernel: [85117.444095] ata5.00: configured for UDMA/133
May 22 00:14:06 helios64 kernel: [85117.444254] sd 4:0:0:0: [sdd] tag#25 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=1s
May 22 00:14:06 helios64 kernel: [85117.444259] sd 4:0:0:0: [sdd] tag#25 Sense Key : 0x5 [current] 
May 22 00:14:06 helios64 kernel: [85117.444264] sd 4:0:0:0: [sdd] tag#25 ASC=0x21 ASCQ=0x4 
May 22 00:14:06 helios64 kernel: [85117.444270] sd 4:0:0:0: [sdd] tag#25 CDB: opcode=0x88 88 00 00 00 00 01 02 bb 76 00 00 00 02 00 00 00
May 22 00:14:06 helios64 kernel: [85117.444275] blk_update_request: I/O error, dev sdd, sector 4340807168 op 0x0:(READ) flags 0x80700 phys_seg 43 prio class 0
May 22 00:14:06 helios64 kernel: [85117.444320] sd 4:0:0:0: [sdd] tag#26 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=1s
May 22 00:14:06 helios64 kernel: [85117.444325] sd 4:0:0:0: [sdd] tag#26 Sense Key : 0x5 [current] 
May 22 00:14:06 helios64 kernel: [85117.444329] sd 4:0:0:0: [sdd] tag#26 ASC=0x21 ASCQ=0x4 
May 22 00:14:06 helios64 kernel: [85117.444334] sd 4:0:0:0: [sdd] tag#26 CDB: opcode=0x88 88 00 00 00 00 01 02 bb 7c 00 00 00 02 00 00 00
May 22 00:14:06 helios64 kernel: [85117.444338] blk_update_request: I/O error, dev sdd, sector 4340808704 op 0x0:(READ) flags 0x80700 phys_seg 64 prio class 0
May 22 00:14:06 helios64 kernel: [85117.444362] sd 4:0:0:0: [sdd] tag#27 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=1s
May 22 00:14:06 helios64 kernel: [85117.444366] sd 4:0:0:0: [sdd] tag#27 Sense Key : 0x5 [current] 
May 22 00:14:06 helios64 kernel: [85117.444370] sd 4:0:0:0: [sdd] tag#27 ASC=0x21 ASCQ=0x4 
May 22 00:14:06 helios64 kernel: [85117.444375] sd 4:0:0:0: [sdd] tag#27 CDB: opcode=0x88 88 00 00 00 00 01 02 bb 7e 00 00 00 02 00 00 00
May 22 00:14:06 helios64 kernel: [85117.444379] blk_update_request: I/O error, dev sdd, sector 4340809216 op 0x0:(READ) flags 0x80700 phys_seg 54 prio class 0
May 22 00:14:06 helios64 kernel: [85117.444397] ata5: EH complete

 

Link to post
Share on other sites

Hi,

 

I have nothing "new" for this thread but I can add some feedback as I have a similar issue (as the one originally reported)

 

I've a raid 1+0 array with 4 WD30EFAX, connected to ata1-4 ports.

The system was running "Armbian 21.02.2 Buster with Linux 5.10.16-rockchip64" on emmc at the time I reported the issue to Kobol support. I have the enclosure kit, so I use the provided SATA harness.

 

A file transfer (read ~15G) from an NFS client was enough to trigger a lot of occurrences of:

 

[78707.655206] ata4.00: failed command: READ FPDMA QUEUED
[78707.655231] ata4.00: cmd 60/80:f8:00:cc:bb/00:00:4f:01:00/40 tag 31 ncq dma 65536 in
res 40/00:18:80:cf:e3/00:00:48:01:00/40 Emask 0x14 (ATA bus error)
[78707.655242] ata4.00: status: { DRDY }
[78707.655263] ata4: hard resetting link
[78708.545911] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[78708.547161] ata4.00: configured for UDMA/133
[78708.548485] ata4: EH complete


 

Happens on ata4 mostly, and also on ata3 but less, and never on ata1 and ata2. I also added a ssd on ata5 at one point and saw the issue on that port as well.

 

Kobol sent me a new harness which looked very similar to the one I already had and I did the swap. Thought it was fixed because the file transfer was not triggering the issue anymore, but after the next occurence of the periodic scrub, it was back, mainly on ata4 and a few ata3 "hard resetting link".

 

Since then I am using `extraargs=libata.force=3.0` with no issue (which is fine for my usage).

 

I didn't try to use an external PSU + "standard" SATA cables but from what I've read here I'm confident it would fix the issue.

 

I supposed it could be crosstalk in the harness, but could also be on the board when using the onboard HDD power plug for the disks as @ShadowDance was saying that it is still happening with standard SATA cables + power from the "mutilated" harness.

Link to post
Share on other sites

My Helios froze today, no contact with SSH nor USB serial console. Blue heartbeat was off and red led was on, can't remember if it pulsed/blinked... Had to reboot. Now my system has been resilvering since (3 disk zfs mirror pool) with 2 of the disks being resilvered. I'll copy zfs status output here:

 

root@helios64:~# zpool status
  pool: export
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Jun  4 14:32:34 2021
    1.52T scanned at 161M/s, 1.34T issued at 143M/s, 1.85T total
    207M resilvered, 72.47% done, 01:02:29 to go
config:

    NAME                                          STATE     READ WRITE CKSUM
    export                                        DEGRADED     0     0     0
      mirror-0                                    DEGRADED     0     0     0
        ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N4RP989R  UNAVAIL      5   591     0  (resilvering)
        ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N4TH65ZV  FAULTED      4   455     0  too many errors  (resilvering)
        wwn-0x5000c500c7771926                    ONLINE       0     0     0

errors: No known data errors
root@helios64:~# 
 

Those 2 disks that are resilvering are new disks, the last one is older. 

 

Do we have a solution of some sort? I've just requested the HDD wires via email...

 

Link to post
Share on other sites

@0utc45t for me helped to force sata 3.0

 

 

 

Quote

root@helios64:~# cat /boot/armbianEnv.txt 
verbosity=7
console=serial
extraargs=earlyprintk ignore_loglevel libata.force=3.0
 

 

Quote

root@helios64:~# zpool status
  pool: data
 state: ONLINE
  scan: scrub repaired 0B in 06:10:03 with 0 errors on Sun May  9 06:34:04 2021
config:

    NAME           STATE     READ WRITE CKSUM
    data           ONLINE       0     0     0
      mirror-0     ONLINE       0     0     0
        sda-crypt  ONLINE       0     0     0
        sdb-crypt  ONLINE       0     0     0
    cache
      sdc2         ONLINE       0     0     0

errors: No known data errors
root@helios64:~# uptime
 20:06:09 up 11 days,  5:26,  1 user,  load average: 0.09, 0.12, 0.14
root@helios64:~# uname -a
Linux helios64 5.10.35-rockchip64 #21.05.1 SMP PREEMPT Fri May 7 13:53:11 UTC 2021 aarch64 GNU/Linux
root@helios64:~# 
 

 

Link to post
Share on other sites

Same here :(

I have a ZFS pool with 4 HDD WD40EFRX in raidz1 connected to ATA 2, 3, 4 and 5

At the first time I performed a scrub I got a degraded pool with a disk faulted. The faulted disk can vary, usually between ata 3, 4 of 5 but (so far) never ata 2.

Forcing sata to 3.0 did not solve this issue for me.

 

Quote

Helios64 | 21.05.1 | arm64 | aarch64 | 5.10.35-rockchip64

 

Quote

zpool status backup
  pool: backup
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
    Sufficient replicas exist for the pool to continue functioning in a
    degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
    repaired.
  scan: scrub in progress since Thu Jun 10 10:29:17 2021
    4.22T scanned at 357M/s, 4.02T issued at 340M/s, 4.50T total
    8.04M repaired, 89.18% done, 00:25:03 to go
config:

    NAME                                      STATE     READ WRITE CKSUM
    backup                                    DEGRADED     0     0     0
      raidz1-0                                DEGRADED   219     0     0
        9c797e9c-a74c-4c09-b7eb-6a46cf311e42  DEGRADED    88     0    25  too many errors  (repairing)
        e3dc2517-08de-4b66-ac98-70ae0a09b4c7  FAULTED    130     0     0  too many errors  (repairing)
        bd2eeb62-0179-4212-a5cb-6b1dca10c8b5  DEGRADED    82     0    14  too many errors  (repairing)
        ccfbf253-e6d9-4af5-8d34-0dda95d64d56  DEGRADED   113     0     0  too many errors  (repairing)

errors: No known data errors

 

Quote

[   37.461980] sd 1:0:0:0: [sdb] tag#19 Sense Key : 0x5 [current] 
[   37.462004] sd 1:0:0:0: [sdb] tag#19 ASC=0x21 ASCQ=0x4 
[   37.462032] sd 1:0:0:0: [sdb] tag#19 CDB: opcode=0x88 88 00 00 00 00 00 03 b5 ec a0 00 00 02 60 00 00
[   37.462086] zio pool=backup vdev=/dev/disk/by-partuuid/9c797e9c-a74c-4c09-b7eb-6a46cf311e42 error=5 type=1 offset=31873122304 size=311296 flags=40080ca8
[   37.462198] ata2: EH complete
[  130.971066] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[  130.974422] Bridge firewalling registered
[  131.596227] Initializing XFRM netlink socket
[  143.974397] softdog: initialized. soft_noboot=0 soft_margin=60 sec soft_panic=0 (nowayout=0)
[  143.974411] softdog:              soft_reboot_cmd=<not set> soft_active_on_boot=0
[ 8975.203030] ata3: hard resetting link
[ 8975.678955] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 8975.680077] ata3.00: configured for UDMA/133
[ 8975.680145] sd 2:0:0:0: [sdc] tag#5 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 8975.680151] sd 2:0:0:0: [sdc] tag#5 Sense Key : 0x5 [current] 
[ 8975.680155] sd 2:0:0:0: [sdc] tag#5 ASC=0x21 ASCQ=0x4 
[ 8975.680161] sd 2:0:0:0: [sdc] tag#5 CDB: opcode=0x88 88 00 00 00 00 00 39 22 c0 80 00 00 01 58 00 00
[ 8975.680180] zio pool=backup vdev=/dev/disk/by-partuuid/e3dc2517-08de-4b66-ac98-70ae0a09b4c7 error=5 type=1 offset=490791305216 size=176128 flags=40080cb0
[ 8975.680227] sd 2:0:0:0: [sdc] tag#17 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 8975.680232] sd 2:0:0:0: [sdc] tag#17 Sense Key : 0x5 [current] 
[ 8975.680237] sd 2:0:0:0: [sdc] tag#17 ASC=0x21 ASCQ=0x4 
[ 8975.680241] sd 2:0:0:0: [sdc] tag#17 CDB: opcode=0x88 88 00 00 00 00 00 39 22 c1 e0 00 00 00 50 00 00
[ 8975.680252] zio pool=backup vdev=/dev/disk/by-partuuid/e3dc2517-08de-4b66-ac98-70ae0a09b4c7 error=5 type=1 offset=490791485440 size=40960 flags=1808b0
[ 8975.680281] sd 2:0:0:0: [sdc] tag#31 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 8975.680286] sd 2:0:0:0: [sdc] tag#31 Sense Key : 0x5 [current] 
[ 8975.680290] sd 2:0:0:0: [sdc] tag#31 ASC=0x21 ASCQ=0x4 
[ 8975.680294] sd 2:0:0:0: [sdc] tag#31 CDB: opcode=0x88 88 00 00 00 00 00 39 22 c2 38 00 00 00 a8 00 00
[ 8975.680306] zio pool=backup vdev=/dev/disk/by-partuuid/e3dc2517-08de-4b66-ac98-70ae0a09b4c7 error=5 type=1 offset=490791530496 size=86016 flags=40080cb0
[ 8975.680329] ata3: EH complete
[ 8984.118720] ata4: hard resetting link
[ 8984.594635] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 8984.595763] ata4.00: configured for UDMA/133
[ 8984.595908] sd 3:0:0:0: [sdd] tag#5 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 8984.595914] sd 3:0:0:0: [sdd] tag#5 Sense Key : 0x5 [current] 
[ 8984.595919] sd 3:0:0:0: [sdd] tag#5 ASC=0x21 ASCQ=0x4 
[ 8984.595924] sd 3:0:0:0: [sdd] tag#5 CDB: opcode=0x88 88 00 00 00 00 00 28 2d a4 a0 00 00 00 b0 00 00
[ 8984.595943] zio pool=backup vdev=/dev/disk/by-partuuid/bd2eeb62-0179-4212-a5cb-6b1dca10c8b5 error=5 type=1 offset=345127862272 size=90112 flags=40080cb0
[ 8984.595990] sd 3:0:0:0: [sdd] tag#6 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 8984.595994] sd 3:0:0:0: [sdd] tag#6 Sense Key : 0x5 [current] 
[ 8984.595999] sd 3:0:0:0: [sdd] tag#6 ASC=0x21 ASCQ=0x4 
[ 8984.596003] sd 3:0:0:0: [sdd] tag#6 CDB: opcode=0x88 88 00 00 00 00 00 28 2d a5 50 00 00 04 78 00 00
[ 8984.596015] zio pool=backup vdev=/dev/disk/by-partuuid/bd2eeb62-0179-4212-a5cb-6b1dca10c8b5 error=5 type=1 offset=345127952384 size=585728 flags=40080cb0
[ 8984.596042] sd 3:0:0:0: [sdd] tag#15 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 8984.596047] sd 3:0:0:0: [sdd] tag#15 Sense Key : 0x5 [current] 
[ 8984.596050] sd 3:0:0:0: [sdd] tag#15 ASC=0x21 ASCQ=0x4 
[ 8984.596055] sd 3:0:0:0: [sdd] tag#15 CDB: opcode=0x88 88 00 00 00 00 00 28 2d a1 e0 00 00 02 c0 00 00
[ 8984.596066] zio pool=backup vdev=/dev/disk/by-partuuid/bd2eeb62-0179-4212-a5cb-6b1dca10c8b5 error=5 type=1 offset=345127501824 size=360448 flags=40080cb0
[ 8984.596089] ata4: EH complete
[ 8991.190477] ata5: hard resetting link
[ 8991.670405] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 8991.671546] ata5.00: configured for UDMA/133
[ 8991.671591] sd 4:0:0:0: [sde] tag#10 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 8991.671597] sd 4:0:0:0: [sde] tag#10 Sense Key : 0x5 [current] 
[ 8991.671601] sd 4:0:0:0: [sde] tag#10 ASC=0x21 ASCQ=0x4 
[ 8991.671607] sd 4:0:0:0: [sde] tag#10 CDB: opcode=0x88 88 00 00 00 00 00 3b 8a c3 d8 00 00 00 50 00 00
[ 8991.671625] zio pool=backup vdev=/dev/disk/by-partuuid/ccfbf253-e6d9-4af5-8d34-0dda95d64d56 error=5 type=1 offset=511461273600 size=40960 flags=1808b0
[ 8991.671669] sd 4:0:0:0: [sde] tag#11 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 8991.671674] sd 4:0:0:0: [sde] tag#11 Sense Key : 0x5 [current] 
[ 8991.671678] sd 4:0:0:0: [sde] tag#11 ASC=0x21 ASCQ=0x4 
[ 8991.671683] sd 4:0:0:0: [sde] tag#11 CDB: opcode=0x88 88 00 00 00 00 00 3b 8a c4 88 00 00 00 50 00 00
[ 8991.671694] zio pool=backup vdev=/dev/disk/by-partuuid/ccfbf253-e6d9-4af5-8d34-0dda95d64d56 error=5 type=1 offset=511461363712 size=40960 flags=1808b0
[ 8991.671720] sd 4:0:0:0: [sde] tag#15 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 8991.671724] sd 4:0:0:0: [sde] tag#15 Sense Key : 0x5 [current] 
[ 8991.671728] sd 4:0:0:0: [sde] tag#15 ASC=0x21 ASCQ=0x4 
[ 8991.671733] sd 4:0:0:0: [sde] tag#15 CDB: opcode=0x88 88 00 00 00 00 00 3b 8a c4 30 00 00 00 50 00 00
[ 8991.671745] zio pool=backup vdev=/dev/disk/by-partuuid/ccfbf253-e6d9-4af5-8d34-0dda95d64d56 error=5 type=1 offset=511461318656 size=40960 flags=1808b0
[ 8991.671767] ata5: EH complete
[ 9009.517945] ata3: hard resetting link
[ 9009.993781] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 9009.995008] ata3.00: configured for UDMA/133
[ 9009.995238] sd 2:0:0:0: [sdc] tag#13 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9009.995248] sd 2:0:0:0: [sdc] tag#13 Sense Key : 0x5 [current] 
[ 9009.995255] sd 2:0:0:0: [sdc] tag#13 ASC=0x21 ASCQ=0x4 
[ 9009.995265] sd 2:0:0:0: [sdc] tag#13 CDB: opcode=0x88 88 00 00 00 00 00 3d 2f ee e8 00 00 00 b0 00 00
[ 9009.995291] zio pool=backup vdev=/dev/disk/by-partuuid/e3dc2517-08de-4b66-ac98-70ae0a09b4c7 error=5 type=1 offset=525593333760 size=90112 flags=40080cb0
[ 9009.995376] sd 2:0:0:0: [sdc] tag#14 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9009.995384] sd 2:0:0:0: [sdc] tag#14 Sense Key : 0x5 [current] 
[ 9009.995392] sd 2:0:0:0: [sdc] tag#14 ASC=0x21 ASCQ=0x4 
[ 9009.995400] sd 2:0:0:0: [sdc] tag#14 CDB: opcode=0x88 88 00 00 00 00 00 3d 2f f6 78 00 00 00 58 00 00
[ 9009.995417] zio pool=backup vdev=/dev/disk/by-partuuid/e3dc2517-08de-4b66-ac98-70ae0a09b4c7 error=5 type=1 offset=525594324992 size=45056 flags=1808b0
[ 9009.995467] sd 2:0:0:0: [sdc] tag#23 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9009.995475] sd 2:0:0:0: [sdc] tag#23 Sense Key : 0x5 [current] 
[ 9009.995482] sd 2:0:0:0: [sdc] tag#23 ASC=0x21 ASCQ=0x4 
[ 9009.995490] sd 2:0:0:0: [sdc] tag#23 CDB: opcode=0x88 88 00 00 00 00 00 3d 2f ef 98 00 00 00 b0 00 00
[ 9009.995507] zio pool=backup vdev=/dev/disk/by-partuuid/e3dc2517-08de-4b66-ac98-70ae0a09b4c7 error=5 type=1 offset=525593423872 size=90112 flags=40080cb0
[ 9009.995575] sd 2:0:0:0: [sdc] tag#24 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9009.995584] sd 2:0:0:0: [sdc] tag#24 Sense Key : 0x5 [current] 
[ 9009.995591] sd 2:0:0:0: [sdc] tag#24 ASC=0x21 ASCQ=0x4 
[ 9009.995599] sd 2:0:0:0: [sdc] tag#24 CDB: opcode=0x88 88 00 00 00 00 00 3d 2f f0 48 00 00 06 30 00 00
[ 9009.995616] zio pool=backup vdev=/dev/disk/by-partuuid/e3dc2517-08de-4b66-ac98-70ae0a09b4c7 error=5 type=1 offset=525593513984 size=811008 flags=40080cb0
[ 9009.995664] ata3: EH complete
[ 9014.121760] ata3: hard resetting link
[ 9014.597636] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 9014.598887] ata3.00: configured for UDMA/133
[ 9014.599233] sd 2:0:0:0: [sdc] tag#10 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9014.599243] sd 2:0:0:0: [sdc] tag#10 Sense Key : 0x5 [current] 
[ 9014.599250] sd 2:0:0:0: [sdc] tag#10 ASC=0x21 ASCQ=0x4 
[ 9014.599260] sd 2:0:0:0: [sdc] tag#10 CDB: opcode=0x88 88 00 00 00 00 00 41 b8 a8 38 00 00 01 b0 00 00
[ 9014.599287] zio pool=backup vdev=/dev/disk/by-partuuid/e3dc2517-08de-4b66-ac98-70ae0a09b4c7 error=5 type=1 offset=564540764160 size=221184 flags=40080cb0
[ 9014.599372] sd 2:0:0:0: [sdc] tag#12 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9014.599380] sd 2:0:0:0: [sdc] tag#12 Sense Key : 0x5 [current] 
[ 9014.599388] sd 2:0:0:0: [sdc] tag#12 ASC=0x21 ASCQ=0x4 
[ 9014.599396] sd 2:0:0:0: [sdc] tag#12 CDB: opcode=0x88 88 00 00 00 00 00 41 b8 98 c0 00 00 07 88 00 00
[ 9014.599416] zio pool=backup vdev=/dev/disk/by-partuuid/e3dc2517-08de-4b66-ac98-70ae0a09b4c7 error=5 type=1 offset=564538736640 size=987136 flags=40080cb0
[ 9014.599483] sd 2:0:0:0: [sdc] tag#13 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9014.599492] sd 2:0:0:0: [sdc] tag#13 Sense Key : 0x5 [current] 
[ 9014.599499] sd 2:0:0:0: [sdc] tag#13 ASC=0x21 ASCQ=0x4 
[ 9014.599507] sd 2:0:0:0: [sdc] tag#13 CDB: opcode=0x88 88 00 00 00 00 00 41 b8 a0 50 00 00 07 e0 00 00
[ 9014.599525] zio pool=backup vdev=/dev/disk/by-partuuid/e3dc2517-08de-4b66-ac98-70ae0a09b4c7 error=5 type=1 offset=564539727872 size=1032192 flags=40080cb0
[ 9014.599566] ata3: EH complete
[ 9016.421607] ata5: hard resetting link
[ 9016.897545] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 9016.898703] ata5.00: configured for UDMA/133
[ 9016.898786] sd 4:0:0:0: [sde] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9016.898793] sd 4:0:0:0: [sde] tag#2 Sense Key : 0x5 [current] 
[ 9016.898798] sd 4:0:0:0: [sde] tag#2 ASC=0x21 ASCQ=0x4 
[ 9016.898803] sd 4:0:0:0: [sde] tag#2 CDB: opcode=0x88 88 00 00 00 00 00 74 aa 63 b0 00 00 00 50 00 00
[ 9016.898822] zio pool=backup vdev=/dev/disk/by-partuuid/ccfbf253-e6d9-4af5-8d34-0dda95d64d56 error=5 type=1 offset=1002148683776 size=40960 flags=1808b0
[ 9016.898869] sd 4:0:0:0: [sde] tag#3 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9016.898874] sd 4:0:0:0: [sde] tag#3 Sense Key : 0x5 [current] 
[ 9016.898878] sd 4:0:0:0: [sde] tag#3 ASC=0x21 ASCQ=0x4 
[ 9016.898882] sd 4:0:0:0: [sde] tag#3 CDB: opcode=0x88 88 00 00 00 00 00 74 aa 64 08 00 00 02 b8 00 00
[ 9016.898894] zio pool=backup vdev=/dev/disk/by-partuuid/ccfbf253-e6d9-4af5-8d34-0dda95d64d56 error=5 type=1 offset=1002148728832 size=356352 flags=40080cb0
[ 9016.898942] sd 4:0:0:0: [sde] tag#4 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9016.898946] sd 4:0:0:0: [sde] tag#4 Sense Key : 0x5 [current] 
[ 9016.898950] sd 4:0:0:0: [sde] tag#4 ASC=0x21 ASCQ=0x4 
[ 9016.898954] sd 4:0:0:0: [sde] tag#4 CDB: opcode=0x88 88 00 00 00 00 00 74 aa 66 c8 00 00 00 50 00 00
[ 9016.898965] zio pool=backup vdev=/dev/disk/by-partuuid/ccfbf253-e6d9-4af5-8d34-0dda95d64d56 error=5 type=1 offset=1002149089280 size=40960 flags=1808b0
[ 9016.898990] ata5: EH complete
[ 9017.961554] ata4: hard resetting link
[ 9018.437481] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 9018.438610] ata4.00: configured for UDMA/133
[ 9018.438665] sd 3:0:0:0: [sdd] tag#5 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9018.438671] sd 3:0:0:0: [sdd] tag#5 Sense Key : 0x5 [current] 
[ 9018.438675] sd 3:0:0:0: [sdd] tag#5 ASC=0x21 ASCQ=0x4 
[ 9018.438681] sd 3:0:0:0: [sdd] tag#5 CDB: opcode=0x88 88 00 00 00 00 00 58 90 3e 90 00 00 00 58 00 00
[ 9018.438698] zio pool=backup vdev=/dev/disk/by-partuuid/bd2eeb62-0179-4212-a5cb-6b1dca10c8b5 error=5 type=1 offset=760753233920 size=45056 flags=1808b0
[ 9018.438752] sd 3:0:0:0: [sdd] tag#8 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9018.438757] sd 3:0:0:0: [sdd] tag#8 Sense Key : 0x5 [current] 
[ 9018.438761] sd 3:0:0:0: [sdd] tag#8 ASC=0x21 ASCQ=0x4 
[ 9018.438765] sd 3:0:0:0: [sdd] tag#8 CDB: opcode=0x88 88 00 00 00 00 00 58 90 3d 30 00 00 00 b0 00 00
[ 9018.438778] zio pool=backup vdev=/dev/disk/by-partuuid/bd2eeb62-0179-4212-a5cb-6b1dca10c8b5 error=5 type=1 offset=760753053696 size=90112 flags=40080cb0
[ 9018.438805] sd 3:0:0:0: [sdd] tag#9 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9018.438809] sd 3:0:0:0: [sdd] tag#9 Sense Key : 0x5 [current] 
[ 9018.438813] sd 3:0:0:0: [sdd] tag#9 ASC=0x21 ASCQ=0x4 
[ 9018.438817] sd 3:0:0:0: [sdd] tag#9 CDB: opcode=0x88 88 00 00 00 00 00 58 90 3d e0 00 00 00 b0 00 00
[ 9018.438827] zio pool=backup vdev=/dev/disk/by-partuuid/bd2eeb62-0179-4212-a5cb-6b1dca10c8b5 error=5 type=1 offset=760753143808 size=90112 flags=40080cb0
[ 9018.438847] ata4: EH complete
[ 9025.249274] ata3.00: NCQ disabled due to excessive errors
[ 9025.249386] ata3: hard resetting link
[ 9025.725255] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 9025.726491] ata3.00: configured for UDMA/133
[ 9025.726778] sd 2:0:0:0: [sdc] tag#4 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9025.726787] sd 2:0:0:0: [sdc] tag#4 Sense Key : 0x5 [current] 
[ 9025.726795] sd 2:0:0:0: [sdc] tag#4 ASC=0x21 ASCQ=0x4 
[ 9025.726805] sd 2:0:0:0: [sdc] tag#4 CDB: opcode=0x88 88 00 00 00 00 00 83 87 9d 78 00 00 00 50 00 00
[ 9025.726831] zio pool=backup vdev=/dev/disk/by-partuuid/e3dc2517-08de-4b66-ac98-70ae0a09b4c7 error=5 type=1 offset=1129830871040 size=40960 flags=1808b0
[ 9025.726919] sd 2:0:0:0: [sdc] tag#16 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9025.726928] sd 2:0:0:0: [sdc] tag#16 Sense Key : 0x5 [current] 
[ 9025.726935] sd 2:0:0:0: [sdc] tag#16 ASC=0x21 ASCQ=0x4 
[ 9025.726943] sd 2:0:0:0: [sdc] tag#16 CDB: opcode=0x88 88 00 00 00 00 00 83 87 90 10 00 00 07 e0 00 00
[ 9025.726962] zio pool=backup vdev=/dev/disk/by-partuuid/e3dc2517-08de-4b66-ac98-70ae0a09b4c7 error=5 type=1 offset=1129829113856 size=1032192 flags=40080cb0
[ 9025.727013] sd 2:0:0:0: [sdc] tag#19 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9025.727022] sd 2:0:0:0: [sdc] tag#19 Sense Key : 0x5 [current] 
[ 9025.727029] sd 2:0:0:0: [sdc] tag#19 ASC=0x21 ASCQ=0x4 
[ 9025.727037] sd 2:0:0:0: [sdc] tag#19 CDB: opcode=0x88 88 00 00 00 00 00 83 87 97 f8 00 00 05 78 00 00
[ 9025.727056] zio pool=backup vdev=/dev/disk/by-partuuid/e3dc2517-08de-4b66-ac98-70ae0a09b4c7 error=5 type=1 offset=1129830150144 size=716800 flags=40080cb0
[ 9025.727095] ata3: EH complete
[ 9040.788826] ata2: hard resetting link
[ 9041.264683] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 9041.265809] ata2.00: configured for UDMA/133
[ 9041.265962] sd 1:0:0:0: [sdb] tag#7 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9041.265968] sd 1:0:0:0: [sdb] tag#7 Sense Key : 0x5 [current] 
[ 9041.265972] sd 1:0:0:0: [sdb] tag#7 ASC=0x21 ASCQ=0x4 
[ 9041.265978] sd 1:0:0:0: [sdb] tag#7 CDB: opcode=0x88 88 00 00 00 00 00 28 0e 84 f8 00 00 00 b0 00 00
[ 9041.265996] zio pool=backup vdev=/dev/disk/by-partuuid/9c797e9c-a74c-4c09-b7eb-6a46cf311e42 error=5 type=1 offset=344083525632 size=90112 flags=40080cb0
[ 9041.266044] sd 1:0:0:0: [sdb] tag#18 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9041.266049] sd 1:0:0:0: [sdb] tag#18 Sense Key : 0x5 [current] 
[ 9041.266053] sd 1:0:0:0: [sdb] tag#18 ASC=0x21 ASCQ=0x4 
[ 9041.266057] sd 1:0:0:0: [sdb] tag#18 CDB: opcode=0x88 88 00 00 00 00 00 28 0e 75 28 00 00 07 e8 00 00
[ 9041.266070] zio pool=backup vdev=/dev/disk/by-partuuid/9c797e9c-a74c-4c09-b7eb-6a46cf311e42 error=5 type=1 offset=344081453056 size=1036288 flags=40080cb0
[ 9041.266112] ata2: EH complete
[ 9055.964324] ata2: hard resetting link
[ 9056.440182] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 9056.441407] ata2.00: configured for UDMA/133
[ 9056.441656] sd 1:0:0:0: [sdb] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9056.441665] sd 1:0:0:0: [sdb] tag#2 Sense Key : 0x5 [current] 
[ 9056.441673] sd 1:0:0:0: [sdb] tag#2 ASC=0x21 ASCQ=0x4 
[ 9056.441683] sd 1:0:0:0: [sdb] tag#2 CDB: opcode=0x88 88 00 00 00 00 00 3a bc 70 80 00 00 02 68 00 00
[ 9056.441710] zio pool=backup vdev=/dev/disk/by-partuuid/9c797e9c-a74c-4c09-b7eb-6a46cf311e42 error=5 type=1 offset=504538136576 size=315392 flags=40080cb0
[ 9056.441792] sd 1:0:0:0: [sdb] tag#3 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9056.441800] sd 1:0:0:0: [sdb] tag#3 Sense Key : 0x5 [current] 
[ 9056.441808] sd 1:0:0:0: [sdb] tag#3 ASC=0x21 ASCQ=0x4 
[ 9056.441816] sd 1:0:0:0: [sdb] tag#3 CDB: opcode=0x88 88 00 00 00 00 00 3a bc 76 00 00 00 03 18 00 00
[ 9056.441833] zio pool=backup vdev=/dev/disk/by-partuuid/9c797e9c-a74c-4c09-b7eb-6a46cf311e42 error=5 type=1 offset=504538857472 size=405504 flags=40080cb0
[ 9056.441885] sd 1:0:0:0: [sdb] tag#4 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9056.441893] sd 1:0:0:0: [sdb] tag#4 Sense Key : 0x5 [current] 
[ 9056.441900] sd 1:0:0:0: [sdb] tag#4 ASC=0x21 ASCQ=0x4 
[ 9056.441908] sd 1:0:0:0: [sdb] tag#4 CDB: opcode=0x88 88 00 00 00 00 00 3a bc 72 e8 00 00 03 18 00 00
[ 9056.441926] zio pool=backup vdev=/dev/disk/by-partuuid/9c797e9c-a74c-4c09-b7eb-6a46cf311e42 error=5 type=1 offset=504538451968 size=405504 flags=40080cb0
[ 9056.441965] ata2: EH complete
[ 9063.512007] ata4: hard resetting link
[ 9063.987915] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 9063.989140] ata4.00: configured for UDMA/133
[ 9063.989216] sd 3:0:0:0: [sdd] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9063.989225] sd 3:0:0:0: [sdd] tag#1 Sense Key : 0x5 [current] 
[ 9063.989233] sd 3:0:0:0: [sdd] tag#1 ASC=0x21 ASCQ=0x4 
[ 9063.989243] sd 3:0:0:0: [sdd] tag#1 CDB: opcode=0x88 88 00 00 00 00 00 3c 93 1d 78 00 00 00 58 00 00
[ 9063.989269] zio pool=backup vdev=/dev/disk/by-partuuid/bd2eeb62-0179-4212-a5cb-6b1dca10c8b5 error=5 type=1 offset=520331390976 size=45056 flags=1808b0
[ 9063.989350] sd 3:0:0:0: [sdd] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9063.989358] sd 3:0:0:0: [sdd] tag#2 Sense Key : 0x5 [current] 
[ 9063.989366] sd 3:0:0:0: [sdd] tag#2 ASC=0x21 ASCQ=0x4 
[ 9063.989373] sd 3:0:0:0: [sdd] tag#2 CDB: opcode=0x88 88 00 00 00 00 00 3c 93 1d d0 00 00 00 b0 00 00
[ 9063.989392] zio pool=backup vdev=/dev/disk/by-partuuid/bd2eeb62-0179-4212-a5cb-6b1dca10c8b5 error=5 type=1 offset=520331436032 size=90112 flags=40080cb0
[ 9063.989436] sd 3:0:0:0: [sdd] tag#26 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9063.989444] sd 3:0:0:0: [sdd] tag#26 Sense Key : 0x5 [current] 
[ 9063.989452] sd 3:0:0:0: [sdd] tag#26 ASC=0x21 ASCQ=0x4 
[ 9063.989460] sd 3:0:0:0: [sdd] tag#26 CDB: opcode=0x88 88 00 00 00 00 00 3c 93 1e 80 00 00 00 58 00 00
[ 9063.989477] zio pool=backup vdev=/dev/disk/by-partuuid/bd2eeb62-0179-4212-a5cb-6b1dca10c8b5 error=5 type=1 offset=520331526144 size=45056 flags=1808b0
[ 9063.989516] ata4: EH complete
[ 9071.639690] ata4.00: NCQ disabled due to excessive errors
[ 9071.639803] ata4: hard resetting link
[ 9072.115654] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 9072.116901] ata4.00: configured for UDMA/133
[ 9072.117249] sd 3:0:0:0: [sdd] tag#16 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9072.117259] sd 3:0:0:0: [sdd] tag#16 Sense Key : 0x5 [current] 
[ 9072.117266] sd 3:0:0:0: [sdd] tag#16 ASC=0x21 ASCQ=0x4 
[ 9072.117276] sd 3:0:0:0: [sdd] tag#16 CDB: opcode=0x88 88 00 00 00 00 00 77 b4 fa e0 00 00 01 60 00 00
[ 9072.117303] zio pool=backup vdev=/dev/disk/by-partuuid/bd2eeb62-0179-4212-a5cb-6b1dca10c8b5 error=5 type=1 offset=1028273848320 size=180224 flags=40080cb0
[ 9072.117409] sd 3:0:0:0: [sdd] tag#19 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9072.117418] sd 3:0:0:0: [sdd] tag#19 Sense Key : 0x5 [current] 
[ 9072.117425] sd 3:0:0:0: [sdd] tag#19 ASC=0x21 ASCQ=0x4 
[ 9072.117433] sd 3:0:0:0: [sdd] tag#19 CDB: opcode=0x88 88 00 00 00 00 00 77 b5 01 10 00 00 07 e8 00 00
[ 9072.117452] zio pool=backup vdev=/dev/disk/by-partuuid/bd2eeb62-0179-4212-a5cb-6b1dca10c8b5 error=5 type=1 offset=1028274659328 size=1036288 flags=40080cb0
[ 9072.117523] sd 3:0:0:0: [sdd] tag#23 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9072.117531] sd 3:0:0:0: [sdd] tag#23 Sense Key : 0x5 [current] 
[ 9072.117538] sd 3:0:0:0: [sdd] tag#23 ASC=0x21 ASCQ=0x4 
[ 9072.117547] sd 3:0:0:0: [sdd] tag#23 CDB: opcode=0x88 88 00 00 00 00 00 77 b4 fc 40 00 00 04 d0 00 00
[ 9072.117565] zio pool=backup vdev=/dev/disk/by-partuuid/bd2eeb62-0179-4212-a5cb-6b1dca10c8b5 error=5 type=1 offset=1028274028544 size=630784 flags=40080cb0
[ 9072.117635] ata4: EH complete
[ 9101.714682] ata5: hard resetting link
[ 9102.190600] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 9102.191729] ata5.00: configured for UDMA/133
[ 9102.192188] sd 4:0:0:0: [sde] tag#5 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9102.192194] sd 4:0:0:0: [sde] tag#5 Sense Key : 0x5 [current] 
[ 9102.192199] sd 4:0:0:0: [sde] tag#5 ASC=0x21 ASCQ=0x4 
[ 9102.192205] sd 4:0:0:0: [sde] tag#5 CDB: opcode=0x88 88 00 00 00 00 00 5b 16 49 90 00 00 07 38 00 00
[ 9102.192223] zio pool=backup vdev=/dev/disk/by-partuuid/ccfbf253-e6d9-4af5-8d34-0dda95d64d56 error=5 type=1 offset=782430838784 size=946176 flags=40080cb0
[ 9102.192275] sd 4:0:0:0: [sde] tag#14 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9102.192280] sd 4:0:0:0: [sde] tag#14 Sense Key : 0x5 [current] 
[ 9102.192284] sd 4:0:0:0: [sde] tag#14 ASC=0x21 ASCQ=0x4 
[ 9102.192288] sd 4:0:0:0: [sde] tag#14 CDB: opcode=0x88 88 00 00 00 00 00 5b 16 39 c0 00 00 07 e8 00 00
[ 9102.192301] zio pool=backup vdev=/dev/disk/by-partuuid/ccfbf253-e6d9-4af5-8d34-0dda95d64d56 error=5 type=1 offset=782428766208 size=1036288 flags=40080cb0
[ 9102.192333] sd 4:0:0:0: [sde] tag#16 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9102.192338] sd 4:0:0:0: [sde] tag#16 Sense Key : 0x5 [current] 
[ 9102.192341] sd 4:0:0:0: [sde] tag#16 ASC=0x21 ASCQ=0x4 
[ 9102.192346] sd 4:0:0:0: [sde] tag#16 CDB: opcode=0x88 88 00 00 00 00 00 5b 16 41 a8 00 00 07 e8 00 00
[ 9102.192357] zio pool=backup vdev=/dev/disk/by-partuuid/ccfbf253-e6d9-4af5-8d34-0dda95d64d56 error=5 type=1 offset=782429802496 size=1036288 flags=40080cb0
[ 9102.192379] ata5: EH complete
[ 9118.562142] ata2: hard resetting link
[ 9119.038032] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 9119.039164] ata2.00: configured for UDMA/133
[ 9119.039327] sd 1:0:0:0: [sdb] tag#15 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9119.039333] sd 1:0:0:0: [sdb] tag#15 Sense Key : 0x5 [current] 
[ 9119.039338] sd 1:0:0:0: [sdb] tag#15 ASC=0x21 ASCQ=0x4 
[ 9119.039344] sd 1:0:0:0: [sdb] tag#15 CDB: opcode=0x88 88 00 00 00 00 00 57 be c6 b0 00 00 04 78 00 00
[ 9119.039364] zio pool=backup vdev=/dev/disk/by-partuuid/9c797e9c-a74c-4c09-b7eb-6a46cf311e42 error=5 type=1 offset=753724645376 size=585728 flags=40080cb0
[ 9119.039416] sd 1:0:0:0: [sdb] tag#16 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9119.039421] sd 1:0:0:0: [sdb] tag#16 Sense Key : 0x5 [current] 
[ 9119.039425] sd 1:0:0:0: [sdb] tag#16 ASC=0x21 ASCQ=0x4 
[ 9119.039429] sd 1:0:0:0: [sdb] tag#16 CDB: opcode=0x88 88 00 00 00 00 00 57 be c2 38 00 00 04 78 00 00
[ 9119.039442] zio pool=backup vdev=/dev/disk/by-partuuid/9c797e9c-a74c-4c09-b7eb-6a46cf311e42 error=5 type=1 offset=753724059648 size=585728 flags=40080cb0
[ 9119.039483] ata2: EH complete
[ 9127.437719] ata2.00: NCQ disabled due to excessive errors
[ 9127.437776] ata2: hard resetting link
[ 9127.913702] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 9127.914827] ata2.00: configured for UDMA/133
[ 9127.914876] sd 1:0:0:0: [sdb] tag#8 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9127.914882] sd 1:0:0:0: [sdb] tag#8 Sense Key : 0x5 [current] 
[ 9127.914886] sd 1:0:0:0: [sdb] tag#8 ASC=0x21 ASCQ=0x4 
[ 9127.914892] sd 1:0:0:0: [sdb] tag#8 CDB: opcode=0x88 88 00 00 00 00 00 5d 20 6e 18 00 00 00 58 00 00
[ 9127.914909] zio pool=backup vdev=/dev/disk/by-partuuid/9c797e9c-a74c-4c09-b7eb-6a46cf311e42 error=5 type=1 offset=799951040512 size=45056 flags=1808b0
[ 9127.914953] sd 1:0:0:0: [sdb] tag#9 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9127.914957] sd 1:0:0:0: [sdb] tag#9 Sense Key : 0x5 [current] 
[ 9127.914961] sd 1:0:0:0: [sdb] tag#9 ASC=0x21 ASCQ=0x4 
[ 9127.914965] sd 1:0:0:0: [sdb] tag#9 CDB: opcode=0x88 88 00 00 00 00 00 5d 20 6e 70 00 00 01 08 00 00
[ 9127.914978] zio pool=backup vdev=/dev/disk/by-partuuid/9c797e9c-a74c-4c09-b7eb-6a46cf311e42 error=5 type=1 offset=799951085568 size=135168 flags=40080cb0
[ 9127.915002] ata2: EH complete
[ 9217.346637] ata5.00: NCQ disabled due to excessive errors
[ 9217.346705] ata5: hard resetting link
[ 9217.822632] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 9217.823852] ata5.00: configured for UDMA/133
[ 9217.824064] sd 4:0:0:0: [sde] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9217.824074] sd 4:0:0:0: [sde] tag#0 Sense Key : 0x5 [current] 
[ 9217.824081] sd 4:0:0:0: [sde] tag#0 ASC=0x21 ASCQ=0x4 
[ 9217.824091] sd 4:0:0:0: [sde] tag#0 CDB: opcode=0x88 88 00 00 00 00 00 49 8c 22 78 00 00 07 88 00 00
[ 9217.824118] zio pool=backup vdev=/dev/disk/by-partuuid/ccfbf253-e6d9-4af5-8d34-0dda95d64d56 error=5 type=1 offset=631766315008 size=987136 flags=40080cb0
[ 9217.824219] sd 4:0:0:0: [sde] tag#15 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9217.824227] sd 4:0:0:0: [sde] tag#15 Sense Key : 0x5 [current] 
[ 9217.824234] sd 4:0:0:0: [sde] tag#15 ASC=0x21 ASCQ=0x4 
[ 9217.824242] sd 4:0:0:0: [sde] tag#15 CDB: opcode=0x88 88 00 00 00 00 00 49 8c 21 18 00 00 01 58 00 00
[ 9217.824260] zio pool=backup vdev=/dev/disk/by-partuuid/ccfbf253-e6d9-4af5-8d34-0dda95d64d56 error=5 type=1 offset=631766134784 size=176128 flags=40080cb0
[ 9217.824322] sd 4:0:0:0: [sde] tag#21 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[ 9217.824332] sd 4:0:0:0: [sde] tag#21 Sense Key : 0x5 [current] 
[ 9217.824339] sd 4:0:0:0: [sde] tag#21 ASC=0x21 ASCQ=0x4 
[ 9217.824346] sd 4:0:0:0: [sde] tag#21 CDB: opcode=0x88 88 00 00 00 00 00 49 8c 2a 08 00 00 00 50 00 00
[ 9217.824364] zio pool=backup vdev=/dev/disk/by-partuuid/ccfbf253-e6d9-4af5-8d34-0dda95d64d56 error=5 type=1 offset=631767306240 size=40960 flags=1808b0
[ 9217.824418] ata5: EH complete

 

I'll appreciated any advice on this :)

Link to post
Share on other sites

@usefulnoise I'd start by trying all different suggestions in this thread, e.g. limiting speed, disabling ncq, if you're not using raw disks (i.e. partitions or dm-crypt), make sure you've disabled io schedulers on the disks, etc.

 

Example: libata.force=3.0G,noncq,noncqtrim

 

Disabling ncqtrim is probably unnecessary, but doesn't give any benefit with spinning disks anyway.

 

If none of this helps, and you're sure the disks aren't actually faulty, I'd recommend trying the SATA controller firmware update (it didn't help me) or possibly experimenting with removing noise. Hook the PSU to a grounded wall socket, use 3rd party SATA cables, or try rerouting them.

 

Possibly, if you're desperate, try removing the metal clips from the SATA cables (the clip that hooks into the motherboard socket), it shouldn't be a problem, but could perhaps function as an antenna for noise.

Link to post
Share on other sites

This seems to be the best technical discussion of the issue that I've found so far, so I'm jumping in even though I'm using a different platform. I'm having the same issue with these drives, but I have a slightly larger sample set to play with an I've found a pattern in my case. I'm using a Synology NAS with several Western Digital Red Pro drives. I'm seeing that - with flawless consistency - only certain a model number exhibits this problem. I'm also seeing that this model number is explicitly not listed in Synology's hardware compatibility list, even though other Western Digital Red Pro drives of the same capacity are. So I'm strongly suspecting that there is either a hardware or firmware issue with certain models. In my case, I have:

 

WD6002FFWX-68TZ4N0 - NOT Reliable

WD6003FFBX-68MU3N0 - Reliable

WD6003FRYZ-01F0DB0 - Reliable

 

I'm also noticing that the issue occurs on very hot days. The room temperature only varies by a few degrees (yay air conditioning), but again it's a very consistent pattern. We don't have any wall outlet under-voltage issues; our utility power is extremely clean here. We also monitor it with our UPS and ATS systems.

 

Hope this helps.

Edited by ErikC
Fixed typo.
Link to post
Share on other sites

@usefulnoise

@ShadowDance

Maybe this is interesting for you: As I had the same issue but did not wanna mess around too much and some where successful with 3rd party sata cables, I was looking for a way to get this done w/o loosing the HDD slides into the SATA slot option.

I found this (hope this does not count as advertisement):

https://www.aliexpress.com/item/1005002595620272.html

which seems to be the general harness of a Lenovo X3100 M5 Server. I was also looking to get some screw-able SATA mounts, but could not find any that would not require me doing the cable works and soldering too.

However the Lenovo cables looked promising and as it turns out, you just use this as a drop in replacement. The screws fit. However you have to mount them from the front side instead the back, but the case has enough space to do that.

IMG_20210609_225431581_HDR_small.thumb.jpg.4ad97badeb49645478471f5785449cc2.jpg

 

You do however need to push very hard, to get the screws to grip, but even the threads match.

I did a bit of grinding on the metal frame though to reduce the tension on the plastic of the connector cause it is a bit thicker. Not sure that is required though (I think not).

For the fifth slot some grinding will be inevitable though. Since that slot is not populated for me right now (and has to wait a bit on current pricing unless I find a bargain) and one cable set has 4 cables, this was no issue. Another two cable sets are on the way though (have another Helios64).

I do not know how you cut the connectors of the original harness to get power only, but I used a separation disk on a Dremel like tool. Worked like a charm.

The cables are all similar length so you need to do try a bit until you find a position that works and does not twist the cable too much but gets it to the ports on the board.
Here what I ended up with (sorry for the sub-optimal quality but only saw this now on the PC and I do not want to disassemble the case again)

IMG_20210609_225412376_small.thumb.jpg.78f75a906846593a79671b4422799354.jpgIMG_20210609_230048525_HDR_small.thumb.jpg.38fc8421f0367a5dada3b064a363bda1.jpg

 

Hope that is helpful for anyone. I am super happy with this solution :)

I have not had any issues with my ZFS setup anymore since.

 

Cheers

 

Link to post
Share on other sites

@meymarce: How did you do the power? Don't know if the cable on Ali is still the same, for me it's this one:

image.png.52136830cbc89f6e497a7c15cf655741.png

 

After my box has been going strong since November, this morning I got a mail that a backup failed, because the target filesystem was read-only. A look in dmesg told it was the dreaded READ FPDMA QUEUED:

[Sun Jul 25 00:57:14 2021] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: errors=remount-ro
[Sun Jul 25 00:58:55 2021] ata4.00: exception Emask 0x0 SAct 0x20 SErr 0x0 action 0x0
[Sun Jul 25 00:58:55 2021] ata4.00: irq_stat 0x40000000
[Sun Jul 25 00:58:55 2021] ata4.00: failed command: READ FPDMA QUEUED
[Sun Jul 25 00:58:55 2021] ata4.00: cmd 60/08:28:00:0d:c0/01:00:83:00:00/40 tag 5 ncq dma 135168 in
                                    res 43/40:08:00:0d:c0/00:01:83:00:00/40 Emask 0x409 (media error) <F>
[Sun Jul 25 00:58:55 2021] ata4.00: status: { DRDY SENSE ERR }
[Sun Jul 25 00:58:55 2021] ata4.00: error: { UNC }
[Sun Jul 25 00:58:55 2021] ata4.00: configured for UDMA/100
[Sun Jul 25 00:58:55 2021] sd 3:0:0:0: [sdb] tag#5 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=3s
[Sun Jul 25 00:58:55 2021] sd 3:0:0:0: [sdb] tag#5 Sense Key : 0x3 [current] 
[Sun Jul 25 00:58:55 2021] sd 3:0:0:0: [sdb] tag#5 ASC=0x11 ASCQ=0x4 
[Sun Jul 25 00:58:55 2021] sd 3:0:0:0: [sdb] tag#5 CDB: opcode=0x88 88 00 00 00 00 00 83 c0 0d 00 00 00 01 08 00 00
[Sun Jul 25 00:58:55 2021] blk_update_request: I/O error, dev sdb, sector 2210401536 op 0x0:(READ) flags 0x80700 phys_seg 33 prio class 0
<snip>
[Sun Jul 25 00:59:42 2021] Aborting journal on device sdb1-8.
[Sun Jul 25 00:59:42 2021] EXT4-fs (sdb1): Remounting filesystem read-only
[Sun Jul 25 00:59:42 2021] EXT4-fs error (device sdb1) in ext4_reserve_inode_write:5751: IO failure
[Sun Jul 25 00:59:42 2021] EXT4-fs error (device sdb1): ext4_ext_tree_init:828: inode #69077613: comm rsnapshot: mark_inode_dirty error
[Sun Jul 25 00:59:44 2021] EXT4-fs error (device sdb1) in ext4_reserve_inode_write:5751: Journal has aborted
[Sun Jul 25 00:59:44 2021] EXT4-fs error (device sdb1): __ext4_new_inode:1335: inode #69077613: comm rsnapshot: mark_inode_dirty error
[Sun Jul 25 00:59:44 2021] EXT4-fs error (device sdb1) in __ext4_new_inode:1337: Journal has aborted
[Sun Jul 25 00:59:44 2021] EXT4-fs error (device sdb1) in ext4_evict_inode:255: Journal has aborted

[Sun Jul 25 00:58:55 2021] ata4: EH complete

 

After repairing the filesystem I retried, and got the same result. Now I hotplugged the disk from slot 4 to slot 3, and the backup succeeded without problems.

 

I am running Armbian 21.05.1 Buster with Linux 5.10.35-rockchip64, from a sata disk in the M.2 slot, and I had two 4TB disks in slot 4 and 5. The disk in slot 4 is normally not mounted, the backup script mounts it, and unmounts it after backup is done, once a week. Both disks have a simple ext4 filesystem without raid.

 

 

Link to post
Share on other sites

 Share

12 12