Jump to content

Trusty SATA boot hang (in mountall?)


Recommended Posts

Hi everyone,

 

I have a problem with my Cubietruck that I was able to partly debug now, but I really like to have input from those who know more about this setup.

On my Cubietruck (trusty, kernel 4.6.5-sunxi), I have a 120GB SSD (no-name brand) and added an ext4 partition and corresponding fstab line to mount it.

 

 

Booting and mounting this partition works well when I cold start my system, and had powered it off before for a while.

 

But when I do a warm start (sudo reboot), the system hangs on boot, quite reliably (it seems to occasionally work, though).

 

I changed my kernel command line and added 'init=/bin/sh', started a 2nd bash terminal before replacing the bash with upstart and tried to figure out where upstart got hung up with 'initctl list'.

 

It looks like it got stuck waiting for a mountall which waited for something.

 

[sidenote: It would be great if upstart would log which process it is waiting on when that process takes more than a given amount of time, and mountall shouldn't hang (seemingly indefinitely) in this situation. I might file bugs with those packages as soon as I have understood the whole problem. Took me quite a while to figure this out.]

 

What I found was that my '/dev/sda*' devices/partitions are missing!

 

Additionally, my kernel log has this entry:

 

ata1: SATA link down (SStatus 0 SControl 300)

 

Compared to the successful system boots, there the usual messages that the kernel detected my SATA drive are missing as well.

 

 

My system is quite customized by now, so I was quite sure what was causing this hang (and I only noticed it after lots of customization). Removing the fstab mount entry makes it come up reliably now, though.

 

I found this and wonder whether it is related. All my packages should include the fixes, though?

 

Is this a problem with my SSD warm-starting? Might this be an initialization issue in the SATA driver? Is there a good way to work around this problem? Is this a known problem?

 

Thanks a lot in advance!

 

EDIT: This seems to be SSD specific. The built-in SSD delivered with the Cubietruck shows the above behavior, whereas an old Intel 'SSD 320 Series 40GB' comes up fine and without any problem, also after every warm start.

 

I still like to know whether there is a way to work around this issue. ('echo "- - -" > .../scan'  doesn't do anything, link is still reported as down after that).

 

Any clues or ideas?

Link to comment
Share on other sites

To continue my self talk here:

 

I found a feasible workaround, but I am still not 100% sure what is going on here.

 

In the meantime, I found this thread, where people say that the Cubietruck will power cycle the SATA disk when rebooting.

 

I suspected this might be related to my problem: My no-name SATA SSD disk might get power cycled just quickly enough that its internal firmware crashes, but not enough to cleanly reset its controller during the next boot.

 

Problem with that explanation: I measured the drop out period during reboot. About 500ms. When I unload and immediately reload the ahci_sunxi module, I get a 5V SSD power power drop out time of just ~80ms, but the SSD still comes up fine afterwards.

 

In any case, the following workaround seems work reliably for me:

 

- Rebuild kernel to have ahci_sunxi as a loadable module

- Add logic to the upstart configuration to unload and reload the ahci_sunxi module, with a 1s sleep in-between. This will also power-cycle the SSD.

 

My problematic SSD identifies as (with hdparm -I /dev/sda):

 

/dev/sda:

ATA device, with non-removable media
        Model Number:       CJ325120JC                              
        Serial Number:      XXX  
        Firmware Revision:  1.094.12
        Transport:          Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0

 

 

I am still curious if anyone else has a similar problem.

Link to comment
Share on other sites

Hi Igor,

 

thanks for the hint!

 

I built the u-boot from your link, using tag v2016.07, as that seems to be the latest. Unfortunately, that doesn't change the problem.

 

For some reason it also seems to lack SATA support (though there are some flags set in the Cubietruck_config that seem to refer to AHCI specific things).

 

And cmd/sata.c seems to be included in the uboot binary only when CONFIG_CMD_SATA is set, and *none* of the configurations in configs/ seem to use that flag?

 

I tried to use the 'scsi' command, but that doesn't find any devices.

 

What is the reason that you go with mainline u-boot for cubietruck if this other one is available and has better support?

 

I also do not need SATA in u-boot; I want to boot from MMC, my SSD is going to be purely a data disk. I was figuring you suggested this because maybe the u-boot SATA initialization might help the kernel(?)

Link to comment
Share on other sites

I don't fiddle around u-boot much so I can't provide you exact "how to" out of my head.

We are using mainline u-boot source to narrow down sources we use. Therefore Allwinner boards are covered with one u-boot source, for legacy and mainline kernel. It's also possible to boot some other boards directly from mainline u-boot with minimum set of patches. Until recently we had not much problems doing it this way.

Link to comment
Share on other sites

@Igor, thanks for the info!

 

@zador: Thanks for the insight. That's what I am hoping as well; I didn't have problems with an older Intel SSD, so I hope I can avoid this mess with another, better SSD. I still wonder whether there is a way to avoid the power cycling during reboot? Or is the A20 reset line always resetting all GPIOs as well?

Link to comment
Share on other sites

- Add logic to the upstart configuration to unload and reload the ahci_sunxi module, with a 1s sleep in-between. This will also power-cycle the SSD.

 

Power-cycling the connected disk on a Cubietruck should also be possible with

sunxi-pio -m PH12'<default><default<default><0>' # off
sunxi-pio -m PH12'<default><default<default><1>' # on

See last post here and please report back whether that works too.

Link to comment
Share on other sites

By the way, if you have a look at the specs, an SSD need 5W on active state !

 

Pardon? This absolutely depends on the SSD in question, there exist some that consume energy like hell and others that don't. Also some SSDs allow to adjust performance vs. consumption and some enter low power states after some time of inactivity (check 'hdparm -B/-C') and it also depends on the capacity when speaking not about idle/slumber consumption but while being active (higher capacity --> more consumption).

 

http://www.anandtech.com/show/8747/samsung-ssd-850-evo-review/10 

Link to comment
Share on other sites

Well, (bad) SSD (may) need 5W on active state. (what about init, journal replay or fsck). When running on battery, the idle state consumption is what import. But one have to be cautious about how it is powered. I power mine directly (not threw the card).

Link to comment
Share on other sites

Hey,

Power-cycling the connected disk on a Cubietruck should also be possible with

sunxi-pio -m PH12'<default><default<default><0>' # off
sunxi-pio -m PH12'<default><default<default><1>' # on

See last post here and please report back whether that works too.

Perfect! Thanks a lot. I was not aware of sunxi-pio.

 

After unloading ahci_sunxi.ko, I tried to do "echo 236 > /sys/class/gpio/export" but I only got:

 

-bash: echo: write error: Device or resource busy
 

So apparently something else in the kernel blocks it?

 

I guess sunxi-pio does direct memory mapped I/O, disregarding all kernel ownership protection of pins?

 

EDIT: I can actually kind of reproduce the boot up problem while the system is running with sunxi-pio. This appears to often get my SSD into a state where it doesn't answer anymore:

sunxi-pio -m PH12'<default><default<default><0>'; sleep .2; sunxi-pio -m PH12'<default><default<default><1>'

# dmesg|tail
[  727.291134] ata2: limiting SATA link speed to 1.5 Gbps
[  732.287274] ata2: hard resetting link
[  732.609558] ata2: SATA link down (SStatus 0 SControl 310)
[  732.609637] ata2.00: disabled
[  732.609751] ata2: EH complete
[  732.609854] ata2.00: detaching (SCSI 0:0:0:0)
[  732.614017] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[  732.614731] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=0x04 driverbyte=0x00
[  732.614785] sd 0:0:0:0: [sda] Stopping disk
[  732.614995] sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=0x00


Whereas this (longer sleep) seems to (mostly) succeed in getting the SSD to answer:

sunxi-pio -m PH12'<default><default<default><0>'; sleep 5; sunxi-pio -m PH12'<default><default<default><1>'

# dmesg|tail
[  863.051345] ata2.00: configured for UDMA/100
[  863.051430] ata2: EH complete
[  863.053219] scsi 0:0:0:0: Direct-Access     ATA      CJ325120JC       4.12 PQ: 0 ANSI: 5
[  863.130357] sd 0:0:0:0: Attached scsi generic sg0 type 0
[  863.134181] sd 0:0:0:0: [sda] 234441648 512-byte logical blocks: (120 GB/112 GiB)
[  863.135711] sd 0:0:0:0: [sda] Write Protect is off
[  863.135772] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[  863.136212] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[  863.146318]  sda: sda1
[  863.155229] sd 0:0:0:0: [sda] Attached SCSI disk

So yes, it does look very much like short power drops will get my SSD stuck. The short drop during boot is very probably what got my SSD stuck.

 

So my SSD probably lacks proper brown-out detection circuitry. Time to get a better one ...

 

Now I would be very curious for someone with a good brand SSD to try the sleep .2 variant and post the results :)

Link to comment
Share on other sites

After unloading ahci_sunxi.ko, I tried to do "echo 236 > /sys/class/gpio/export" but I only got:

 

-bash: echo: write error: Device or resource busy

 

Hmm... are GPIOs that are not defined as GPIO (but SATA power in this case) accessible through sysfs at all? I really don't know and will rely on sunxi-pio instead (that can also query pin states and some more stuff)

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines