Jens Bauer Posted July 22, 2020 Posted July 22, 2020 I just experienced this old problem with Focal: WD drives inaccessible starting with kernel 4.13 -As the topic is now closed, I'd like to add some information. My drive is a 2.5" 1TB WD10SPZX (WD BLUE). It was connected via a Startech.com port-multiplier. Normally I use only WD RED, but I had to move some large files, so I attached it temporarily. Here's stack-trace #1... Spoiler [94740.634921] ata2.00: failed to read SCR 1 (Emask=0x40) [94740.634938] ata2.01: failed to read SCR 1 (Emask=0x40) [94740.634945] ata2.02: failed to read SCR 1 (Emask=0x40) [94740.634951] ata2.03: failed to read SCR 1 (Emask=0x40) [94740.634973] ata2.04: failed to read SCR 1 (Emask=0x40) [94740.694172] Internal error: synchronous external abort: 96000210 [#1] PREEMPT SMP [94740.699046] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_multiport iptable_filter zram zsmalloc bridge stp llc sch_fq_codel ip_tables x_tables mv88e6xxx dsa_core [94740.714197] Process scsi_eh_1 (pid: 124, stack limit = 0x00000000d275a4fb) [94740.721262] CPU: 1 PID: 124 Comm: scsi_eh_1 Not tainted 4.19.56-mvebu64 #5.89 [94740.728601] Hardware name: Globalscale Marvell ESPRESSOBin Board (DT) [94740.735251] pstate: 80000005 (Nzcv daif -PAN -UAO) [94740.740173] pc : ahci_scr_read+0x40/0x78 [94740.744199] lr : sata_scr_read+0x64/0x78 [94740.748223] sp : ffffff8009b0bb10 [94740.751630] x29: ffffff8009b0bb10 x28: ffffffc07f2423e0 [94740.757093] x27: 0000000000000000 x26: ffffffc07f240000 [94740.762560] x25: ffffffc07ee65828 x24: ffffffc07f2439b8 [94740.768025] x23: 0000000000000000 x22: ffffff8008f89000 [94740.773490] x21: 0000000000003008 x20: ffffffc07f242040 [94740.778957] x19: 0000000000000000 x18: ffffffffffffffff [94740.784421] x17: 0000000000000000 x16: 0000000000000000 [94740.789888] x15: ffffff8008fa2000 x14: 00000000fffffff0 [94740.795355] x13: ffffff8009044cd2 x12: ffffff8008fa2000 [94740.800820] x11: 0000000000000000 x10: ffffff8009044000 [94740.806284] x9 : 0000000000000000 x8 : 0000000000000003 [94740.811750] x7 : ffffff8009b0bbb4 x6 : 0000000000000001 [94740.817216] x5 : ffffffc07f242040 x4 : ffffff8008c505d0 [94740.822680] x3 : 0000000000000180 x2 : ffffff8009b0bbb4 [94740.828148] x1 : ffffff80090c21b0 x0 : ffffff80090c2000 [94740.833612] Call trace: [94740.836126] ahci_scr_read+0x40/0x78 [94740.839801] ata_eh_link_autopsy+0x84/0xa20 [94740.844100] ata_eh_autopsy+0xd4/0xe0 [94740.847867] sata_pmp_error_handler+0x48/0x920 [94740.852434] ahci_error_handler+0x3c/0x78 [94740.856555] ata_scsi_port_error_handler+0x198/0x680 [94740.861661] ata_scsi_error+0x94/0xd0 [94740.865429] scsi_error_handler+0x98/0x370 [94740.869639] kthread+0x128/0x130 [94740.872953] ret_from_fork+0x10/0x1c [94740.876629] Code: b8615881 34000161 8b21c061 8b010001 (b9400021) [94740.882899] ---[ end trace a517129638be4c55 ]--- Stack-trace #2: Spoiler [14633.530216] ata2.00: failed to read SCR 1 (Emask=0x40) [14633.530232] ata2.01: failed to read SCR 1 (Emask=0x40) [14633.530239] ata2.02: failed to read SCR 1 (Emask=0x40) [14633.530246] ata2.03: failed to read SCR 1 (Emask=0x40) [14633.530267] ata2.04: failed to read SCR 1 (Emask=0x40) [14633.582564] Internal error: synchronous external abort: 96000210 [#1] PREEMPT SMP [14633.587434] Modules linked in: zram zsmalloc bridge stp llc sch_fq_codel ip_tables x_tables mv88e6xxx dsa_core [14633.597740] Process scsi_eh_1 (pid: 124, stack limit = 0x00000000fb61dcc8) [14633.604819] CPU: 1 PID: 124 Comm: scsi_eh_1 Not tainted 4.19.56-mvebu64 #5.89 [14633.612156] Hardware name: Globalscale Marvell ESPRESSOBin Board (DT) [14633.618807] pstate: 80000005 (Nzcv daif -PAN -UAO) [14633.623730] pc : ahci_scr_read+0x40/0x78 [14633.627752] lr : sata_scr_read+0x64/0x78 [14633.631780] sp : ffffff8009aebb10 [14633.635185] x29: ffffff8009aebb10 x28: ffffffc07f2463e0 [14633.640649] x27: 0000000000000000 x26: ffffffc07f244000 [14633.646115] x25: ffffffc07ee65828 x24: ffffffc07f2479b8 [14633.651581] x23: 0000000000000000 x22: ffffff8008f89000 [14633.657047] x21: 0000000000003008 x20: ffffffc07f246040 [14633.662512] x19: 0000000000000000 x18: ffffffffffffffff [14633.667977] x17: 0000000000000000 x16: 0000000000000000 [14633.673442] x15: ffffff8008fa2000 x14: 00000000fffffff0 [14633.678910] x13: ffffff8009044cd2 x12: ffffff8008fa2000 [14633.684376] x11: 0000000000000000 x10: ffffff8009044000 [14633.689840] x9 : 0000000000000000 x8 : 0000000000000003 [14633.695306] x7 : ffffff8009aebbb4 x6 : 0000000000000001 [14633.700771] x5 : ffffffc07f246040 x4 : ffffff8008c505d0 [14633.706236] x3 : 0000000000000180 x2 : ffffff8009aebbb4 [14633.711702] x1 : ffffff80090c21b0 x0 : ffffff80090c2000 [14633.717168] Call trace: [14633.719682] ahci_scr_read+0x40/0x78 [14633.723357] ata_eh_link_autopsy+0x84/0xa20 [14633.727655] ata_eh_autopsy+0xd4/0xe0 [14633.731422] sata_pmp_error_handler+0x48/0x920 [14633.735990] ahci_error_handler+0x3c/0x78 [14633.740110] ata_scsi_port_error_handler+0x198/0x680 [14633.745218] ata_scsi_error+0x94/0xd0 [14633.748981] scsi_error_handler+0x98/0x370 [14633.753194] kthread+0x128/0x130 [14633.756509] ret_from_fork+0x10/0x1c [14633.760182] Code: b8615881 34000161 8b21c061 8b010001 (b9400021) [14633.766455] ---[ end trace 5867ea8f72bf6a7f ]--- I tried reading a large tar-file several times from the drive, but Focal kept crashing. After that, I tried switching to a WD BLUE (Slim) I had some data on already, and the copy from the tar archive completed without crashing. Here's a list of 3 different types of drives (connected via the same port multiplier): Spoiler $ lsscsi [1:3:0:0] disk ATA WDC WD10SPCX-08S 1A01 /dev/sdd [1:4:0:0] disk ATA WDC WD10JFCX-68N 0A82 /dev/sde $ lsscsi [1:3:0:0] disk ATA WDC WD10SPZX-00Z 1A01 /dev/sdg WD10JFCX-68N is a WD RED (these work very well) WD10SPCX-08S is a WD BLUE, (known as 'Slim', this works) WD10SPZX-00Z is a WD BLUE, this is the drive that causes the crash. The tar-archive I was reading from, is very large (around 300 GB) and the file I was reading was around 2.3GB. 2.3GB would be a negative number as a 32-bit integer, but I don't think there would be any problem regarding this. All the partitions I work with are BTRFS partitions, but this shouldn't matter, since the problem seems to be in the driver. Oh, I used dist-upgrade from Bionic to get to Focal. Please let me know if there's any further useful information I could supply.
Werner Posted July 22, 2020 Posted July 22, 2020 Quote Oh, I used dist-upgrade from Bionic to get to Focal. That could be an issue. This kind of upgrade is neither tested nor supported. If you have a chance try with a fresh image. And always handy to have: armbianmonitor -u 1
Jens Bauer Posted July 24, 2020 Author Posted July 24, 2020 On 7/22/2020 at 9:09 PM, Werner said: That could be an issue. This kind of upgrade is neither tested nor supported. If you have a chance try with a fresh image. And always handy to have: armbianmonitor -u Thank you for your advice, I'll take it with lots of appreciation! :) I'm currently in the process of restoring my server, so I'll have to wait a few days, but I will try with a fresh installation of the most recent image from the download page. Unfortunately I only have one ZXSP drive, so I can't test with several drives of this type - it could have been very useful to know if it really depends on this drive type. Just in case someone reading this should be tempted: I think it's a bad idea to use BLUE (desktop drives) for a server - RED are much more stable.
Recommended Posts