Jump to content

SATA disks not spinning up after apt update


nettings

Recommended Posts

Running a home NAS using Armbian/meson bullseye and openmediavault.

After apt upgrade today, my SATA disks are no longer spinning up or showing up in "fdisk -l". The system is on an SD card and still boots fine.

 

This is what I'm seeing:

[    2.270633] scsi host0: ahci
[    2.272861] scsi host1: ahci
[    2.273074] ata1: SATA max UDMA/133 abar m512@0xfc700000 port 0xfc700100 irq 27
[    2.273090] ata2: SATA max UDMA/133 abar m512@0xfc700000 port 0xfc700180 irq 27
...
[    2.584971] ata1: SATA link down (SStatus 0 SControl 300)
[    2.896948] ata2: SATA link down (SStatus 0 SControl 300)

Both disks work fine in another system.

I have tried to upgrade the firmware, no change. Moved to nightly and booted into linux-image-edge-meson64 (6.1.0), same issue.

Next, I tried booting into kernel 5.19.17-meson64, but the problem persists. I wonder what has happened here? Could it be a userspace problem (since I didn't downgrade userspace here, only the kernel)?

Any hints very much welcome!

 

Link to comment
Share on other sites

19 minutes ago, poulsen84 said:

Hoping there is a easy fix


Workaround? yes (armbian-config, alternative kernels, choose previous one - 5.10.something), fix? With this, there is nothing we can do but wait like you that Hardkernel fix it.

Link to comment
Share on other sites

I can add a few observations:

* The issue is apparently not specific to SSDs. I lost access to both my drives, and they are spinning rust (Toshiba 8TB).

* Interestingly, if I do hot-swap in an SSD (only for debugging and against the recommendations of the odroid folks, who say don't hotswap), the SSD is recognized in either SATA port.

* The issue is also unrelated to Samsung - I tested two SSDs successfully, and both were Samsungs.

* This is not a power supply issue (as was hinted at elsewhere in odroid forums, and which I was suspecting myself given that spindles fail where more energy-efficient SSDs succeed). I'm using the original power supply, and cross-checked with a more powerful one with same voltage and polarity.

* The issue persists in 5.19.17-0025 or 6.0.13-0047 from current, or 6.1.0-0064 from edge (example armbianmonitor output with 6.1 is at https://paste.armbian.com/ewabuparak, but the relevant SATA snippets are always the same).

* Downgrading to 5.10.16  also didn't solve the issue for me (same as reported by GuestMan in the original thread). For the record, here is the corresponding armbianmonitor output (largely identical): https://paste.armbian.com/ofumoyivoz

 

I haven't dared to downgrade to legacy 4.x kernels, but would do so if advised. I'm however pretty sure that I was running a 5.1x.y kernel before the apt upgrade and it worked, but unfortunately I didn't note the last-known-good version number. I usually do update regularly, so my guess is the previous, known-good kernel would have been no older than mid-September 2022.

 

Question: If I switch kernels using the "Stable/Nightly" and "Other" settings in armbian-config, will this always pull in all relevant firmware/device tree/bootloader components? Or is there something that persists from the initial apt upgrade which broke things? I don't see how any userspace components could play a role here, but I may be missing something...

 

Side note: can someone please remove the "one post per day" restriction for vetted accounts? I understand how spam is an issue, but I hope I have proved myself as a constructive user, and this limitation is just massively counter-productive when trying to collectively debug an issue here. Keep in mind that I might also be able to help other users with my modest expertise, which this restriction also effectively prevents.

And before Igor's fuse gets lit again: I'm not demanding anybody's time, and I'm willing to contribute my own resources to pinpoint this issue. Also, I did make a donation to Armbian when I started using it, I just refuse to buy myself out of posting restrictions with an additional subscription. That said, rest assured that the work of the Armbian maintainers is very much appreciated. I would love to see Armbian thrive with a sustainable business model, but taking frustration out on bug reporters is not the way to go. My 0.02 euros...

Correction: turns out the restriction is automatically lifted after the second post, but this was kinda non-obvious...

Edited by nettings
post moved by moderator
Link to comment
Share on other sites

I have tried downgrading to various previous kernels (5.10.x, which used to work) without success.

Testing a few drives, there is one consistent pattern: all 3 1/2 inch drives, whether SSD or spindle, are powering up. None of the 5 1/2 inch drives (all spindles) does.

Which would make me suspect a power issue, as large platters draw much larger currents than notebook drives or SSDs. If not for the fact that this was clearly induced by an update. Is there some setting on the controller that would limit current, or a kind of negotiation phase with the drive electronics before the power lines are even switched on?

Next I'm going to play with the controller power management knobs...

The controller is a ASMedia Technology Inc. ASM1061 SATA IDE Controller (rev 02). Looking elsewhere on the internet, it seems it has a long history of quirky and unreliable behaviour... I wonder if it's best to give up on the HC4...

Link to comment
Share on other sites

Maybe it is a power issue after all, but one related to the onboard voltage regulators. This issue came up on a different hardware platform, and it looks somewhat related:

The corresponding DTD looks completely different though.

 

I'm now comparing two HC4 with two large 5 1/4" drives each. Both are at 5.19.17-meson64.

The broken one runs Armbian bullseye and has been upgraded two days ago. https://paste.armbian.com/fecepoxoda

The intact one runs Armbian buster and hasn't been upgraded in at least a month. http://paste.armbian.com/hayamolexi

 

The only difference I can see (besides the SATA issue) is that the broken one has issues with the RTC - IIUC that's because it's coupled to the OLED module, which only the buster one has.

Link to comment
Share on other sites

Fresh installation from Armbian_22.11.1_Odroidhc4_bullseye_current_5.19.17.img, still no SATA. https://paste.armbian.com/awasopagay

Fresh installation from even older Armbian_22.08.7_Odroidhc4_bullseye_current_5.19.17.img, still no SATA. https://paste.armbian.com/qewucawuxu

This latter one is definitely older than the last-known-good configuration. I'm beginning to suspect that either some persistent firmware upgrade did indeed happen, or permanent damage has been done to the hardware. Still unable to get any large 5 1/4" platter to be spun up or even a link recognized.

 

Link to comment
Share on other sites

No real surprise, but Armbian_22.05.4_Odroidhc4_bullseye_current_5.10.123.img does not work, either.

Tried both with original PSU (delivers 4A at 15V), and another one that does 5A. Even the first drive to be plugged in doesn't wake up, so I think I can rule out PSU issues.

 

Link to comment
Share on other sites

Correction: when I said 5 1/4" drives above, I actually meant 3 1/2". And instead of 3 1/2", I meant notebook-sized two-and-a-half. So these are not some dinosaur drives... Guess I'm getting old (and I do have an 8" floppy lying around somewhere, but I can't find it :o])

Link to comment
Share on other sites

I just restored petitboot, installed Hardkernel's Ubuntu 20.04.4 LTS running Linux odroid 4.9.277-82 #1 SMP PREEMPT Fri Feb 18 14:35:13 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux, and the disks are still dead:
 

[   5.481026] libata version 3.00 loaded.
[    5.483585] ahci 0000:01:00.0: version 3.0
[    5.483611] ahci 0000:01:00.0: enabling device (0000 -> 0003)
[    5.484107] ahci 0000:01:00.0: SSS flag set, parallel bus scan disabled
[    5.490856] ahci 0000:01:00.0: AHCI 0001.0200 32 slots 2 ports 6 Gbps 0x3 impl IDE mode
[    5.498893] ahci 0000:01:00.0: flags: 64bit ncq sntf stag led clo pmp pio slum part ccc sxs 
[    5.508783] ahci 0000:01:00.0: enabling bus mastering

[    5.512956] input: Generic USB Keyboard as /devices/platform/ff500000.dwc3/xhci-hcd.0.auto/usb1/1-2/1-2:1.0/0003:040B:2000.0001/input/input1
[    5.515231] meson-spicc ffd13000.spi: registered master spi0
[    5.517738] spi spi0.0: setup mode 0, 8 bits/w, 100000000 Hz max --> 0
[    5.517876] meson-spicc ffd13000.spi: registered child spi0.0
[    5.520185] scsi host0: ahci
[    5.527998] scsi host1: ahci
[    5.528292] ata1: SATA max UDMA/133 abar m512@0xfc700000 port 0xfc700100 irq 69
[    5.533998] ata2: SATA max UDMA/133 abar m512@0xfc700000 port 0xfc700180 irq 69
[    5.580805] hid-generic 0003:040B:2000.0001: input,hidraw0: USB HID v1.10 Keyboard [Generic USB Keyboard] on usb-xhci-hcd.0.auto-2/input0
[    5.598057] input: Generic USB Keyboard as /devices/platform/ff500000.dwc3/xhci-hcd.0.auto/usb1/1-2/1-2:1.1/0003:040B:2000.0002/input/input2
[    5.664938] hid-generic 0003:040B:2000.0002: input,hidraw1: USB HID v1.10 Mouse [Generic USB Keyboard] on usb-xhci-hcd.0.auto-2/input1
[    5.854694] ata1: SATA link down (SStatus 0 SControl 300)
[    6.166670] ata2: SATA link down (SStatus 0 SControl 300)

 

I'm beginning to believe there is permanent damage to the board when it comes to powering 3.5" spindles. Inserting a 2.5" notebook spindle, it powers up immediately.

 

But I've seen at least two other people who've observed this issue in connection with a distribution update. Hmmm. I have two more HC4 here, so it would be easy (if costly) to see what happens, but they are running in production (as was the dead one :o( )

Link to comment
Share on other sites

I took down a second HC4, inserted the SD card from the system that first showed the problems. Now one SATA drive shows up, which fortunately lets me run manual backups of my critical systems. BUT:

When I booted back into the original SD card of the second HC, THE SECOND SATA PORT IS LOST, apparently permanently.

 

So it looks like once you've had that problem, the hardware is altered (damaged?) permanently. EXPLETIVE!

 

 

Link to comment
Share on other sites

Ok, it looks like dead hardware. If you are seeing something similar, check out this post where a user has a picture of a burnt 12v regulator on the SATA: https://forum.odroid.com/viewtopic.php?t=44517

Next, check the schematics kindly provided by Hardkernel: https://dn.odroid.com/S905X3/ODROID-HC4/odroid-hc4_rev1.0_20200807.pdf

You want to look for U21 and U24 in the index. As you can see, they lead to test pads TP10 and TP11, which are accessible on the underside of the board. You can measure them against ground.

As expected, on my board that doesn't run 3.5" drives any more in either port, both 12v lines are dead. No visible damage to the regulators though.

 

I'm guessing the EN (pin 4) line is something like an "on" switch.  So for a brief moment hope rises, maybe it's a software issue after all. But no: the EN line is shared with a number of other regulators, and they all provide output voltage.

Since [TheBug] on #armbian recommended it, I also measured the output pins directly - one is at input voltage, one at something around 3.3v (that would be EN), and all others are dead.

 

Case closed (for now).

 

Link to comment
Share on other sites

Quote

But I've seen at least two other people who've observed this issue in connection with a distribution update.

 

I'm experiencing that too. Just rebooted since the motd bugged me to reboot and /dev/sdb is gone. -.-

Edited by Autic
Link to comment
Share on other sites

I've got the same issue, updated to latest Bullseye Armbian on my HC4 and now 3.5 inch disks don't spin up. I've tried an alternative 15V PSU, older versions of Armbian, and even put petitboot back on and installed the version of Debian recommended by Hardkernel, but get nothing from my disks. (dmesg shows "ata1: SATA link down" and "ata2: SATA link down") My disks are a pair of 4TB WD green hdds, which work fine when installed in a PC, so it looks like I have the same issue. There's no visible damage to the PCB, but I haven't checked whether the 12v line is dead yet....

Link to comment
Share on other sites

Obviously, the issue in this thread is something I definitely don't want to run into. Is it known which update causes the problems? Is there a specific kernel version that one should avoid? Or is it all kernels from some version that could to harm? (I'm on 6.1.11-meson64 and so far, the hard drives kept spinning...)

Link to comment
Share on other sites

No problems here and I reboot pretty frequently, now running 6.1.11. Using a 2Tb 3.5 inch HDD and a 256mb SSD. Could it be unrelated to system upgrades and just be dodgy hardware that is likely to break at boot when there is a power surge? Anyway good to know about the power regulator.

 

Edit: Now found the other threads that makes it seem like a kernel issue. Anyway I'm using pretty old drives on my HC4, and it is also a quite old HC4 that I bought just after it was announced.

Edited by Nitrax
Link to comment
Share on other sites

Dear Armbian Community!

I've run into the same issue and this thread put me on the right track. I've managed to resolve it meanwhile and I want to share my case with you here.

 

I've used my Odroid HC4 with one Toshiba 14 TB harddrive for a while without any issues (> 1 year).

Recently I purchased a Thoshiba 16 TB HDD and replaced the 14 TB with it. It run a couple of days without an issue. Then, after I updated and rebooted my Nextcloud Pi installation, the drive was not recognized anymore. I cannot remember the exact order of my subsequent steps, but since I own a 2nd HC4 hardware, I once swaped the slots, another time the whole HC4. After that I ended with on dead drive slot on one, and with two dead slots on the other HC4.

 

I downloaded the schematics and measured the output test points of the switching regulator circuits. At each defective drive slot 0V were present at the output of the regulator circuit. Therefore I concluded defective DC-DC converter ICs as well (my current guess is, that the larger Toshiba drives, namely the 16 and 18 TB ones have some sort of higher inrush current at spin up / system reboot. Maybe out of SATA spec, or maybe the HC4's regulators are just a little too weakly dimensioned)

 

I looked for a fully compatible converter IC, but with better specs. And fortunately I've found one. Altough with the same max current of 2 amps, but with a mentioned 2.5 amps peak current rating. And the max input voltage is higher as well - 24V vs. 18V.

This tells the professional or the ambitious hobbyist that a stronger switching FET is integrated in this IC.

 

I ordered them here https://www.mouser.com/ProductDetail/DIOO-Microcircuits/DIO6912ST6?qs=%2BEew9%2B0nqrDv2h%2BRFnw9pw%3D%3D and replaced both in each of my HC4s. It's been working for a few days now with the same 16 TB Toshiba hard drive. Hopefully for a long time.

 

(2 photos attached - 1st shows the desoldered converter ICs U24,U21, 2nd the new soldered ICs - you have to be very careful not to accidentally unsolder the tiny passive components very close by)

Desoldered Converters.jpg

New DC-DC Converters soldered in.jpg

Edited by Edgar
Minor changes in the wording
Link to comment
Share on other sites

Okay new discovery.. seem's to be a drive-specific issue....

Experiencing Issue with no init on boot issue on -- Samsung SSD 860 PRO 256GB

 

I have another device working fine on boot.. no issue. -- Samsung SSD 860 EVO 500GB

even tried moving both drives around different slots.  Behavior was consistent

Link to comment
Share on other sites

On 11/18/2023 at 9:55 PM, lanefu said:

Okay new discovery.. seem's to be a drive-specific issue....

Experiencing Issue with no init on boot issue on -- Samsung SSD 860 PRO 256GB

 

User here had similar issues with EVO 860

 

Weirdest coincidental solution ever.  Guessing early console some how gave more time for the drive to init in time?   

 

https://github.com/armbian/build/pull/6031

Link to comment
Share on other sites

Thank you very much for your feedback, @pcaetano! Good to know that it worked for you as well! 🙂

 

P.S.: Mine are still working with my Toshiba 16 TB hard disk (which originally caused the failure). So it's a lasting fix with these ICs, apparently.

Edited by Edgar
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines