onno Posted July 30, 2016 Posted July 30, 2016 Hi everyone, I have a problem with my Cubietruck that I was able to partly debug now, but I really like to have input from those who know more about this setup. On my Cubietruck (trusty, kernel 4.6.5-sunxi), I have a 120GB SSD (no-name brand) and added an ext4 partition and corresponding fstab line to mount it. Booting and mounting this partition works well when I cold start my system, and had powered it off before for a while. But when I do a warm start (sudo reboot), the system hangs on boot, quite reliably (it seems to occasionally work, though). I changed my kernel command line and added 'init=/bin/sh', started a 2nd bash terminal before replacing the bash with upstart and tried to figure out where upstart got hung up with 'initctl list'. It looks like it got stuck waiting for a mountall which waited for something. [sidenote: It would be great if upstart would log which process it is waiting on when that process takes more than a given amount of time, and mountall shouldn't hang (seemingly indefinitely) in this situation. I might file bugs with those packages as soon as I have understood the whole problem. Took me quite a while to figure this out.] What I found was that my '/dev/sda*' devices/partitions are missing! Additionally, my kernel log has this entry: ata1: SATA link down (SStatus 0 SControl 300) Compared to the successful system boots, there the usual messages that the kernel detected my SATA drive are missing as well. My system is quite customized by now, so I was quite sure what was causing this hang (and I only noticed it after lots of customization). Removing the fstab mount entry makes it come up reliably now, though. I found this and wonder whether it is related. All my packages should include the fixes, though? Is this a problem with my SSD warm-starting? Might this be an initialization issue in the SATA driver? Is there a good way to work around this problem? Is this a known problem? Thanks a lot in advance! EDIT: This seems to be SSD specific. The built-in SSD delivered with the Cubietruck shows the above behavior, whereas an old Intel 'SSD 320 Series 40GB' comes up fine and without any problem, also after every warm start. I still like to know whether there is a way to work around this issue. ('echo "- - -" > .../scan' doesn't do anything, link is still reported as down after that). Any clues or ideas?
onno Posted July 30, 2016 Author Posted July 30, 2016 To continue my self talk here: I found a feasible workaround, but I am still not 100% sure what is going on here. In the meantime, I found this thread, where people say that the Cubietruck will power cycle the SATA disk when rebooting. I suspected this might be related to my problem: My no-name SATA SSD disk might get power cycled just quickly enough that its internal firmware crashes, but not enough to cleanly reset its controller during the next boot. Problem with that explanation: I measured the drop out period during reboot. About 500ms. When I unload and immediately reload the ahci_sunxi module, I get a 5V SSD power power drop out time of just ~80ms, but the SSD still comes up fine afterwards. In any case, the following workaround seems work reliably for me: - Rebuild kernel to have ahci_sunxi as a loadable module - Add logic to the upstart configuration to unload and reload the ahci_sunxi module, with a 1s sleep in-between. This will also power-cycle the SSD. My problematic SSD identifies as (with hdparm -I /dev/sda): /dev/sda:ATA device, with non-removable media Model Number: CJ325120JC Serial Number: XXX Firmware Revision: 1.094.12 Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0 I am still curious if anyone else has a similar problem.
Igor Posted July 31, 2016 Posted July 31, 2016 I don't know if it's related but ... there were some regressions in mainline u-boot for Cubieboard 1 and 2. SATA is / can be enabled in u-boot and by default it should be. It's also possible to boot from within u-boot. Try building uboot from this source: https://github.com/linux-sunxi/u-boot-sunxi
onno Posted July 31, 2016 Author Posted July 31, 2016 Hi Igor, thanks for the hint! I built the u-boot from your link, using tag v2016.07, as that seems to be the latest. Unfortunately, that doesn't change the problem. For some reason it also seems to lack SATA support (though there are some flags set in the Cubietruck_config that seem to refer to AHCI specific things). And cmd/sata.c seems to be included in the uboot binary only when CONFIG_CMD_SATA is set, and *none* of the configurations in configs/ seem to use that flag? I tried to use the 'scsi' command, but that doesn't find any devices. What is the reason that you go with mainline u-boot for cubietruck if this other one is available and has better support? I also do not need SATA in u-boot; I want to boot from MMC, my SSD is going to be purely a data disk. I was figuring you suggested this because maybe the u-boot SATA initialization might help the kernel(?)
Igor Posted July 31, 2016 Posted July 31, 2016 I don't fiddle around u-boot much so I can't provide you exact "how to" out of my head. We are using mainline u-boot source to narrow down sources we use. Therefore Allwinner boards are covered with one u-boot source, for legacy and mainline kernel. It's also possible to boot some other boards directly from mainline u-boot with minimum set of patches. Until recently we had not much problems doing it this way.
zador.blood.stained Posted July 31, 2016 Posted July 31, 2016 If issue cannot be reproduced at every reboot, then it is most likely caused by SSD and not by u-boot or kernel, and cubieboard 1 and 2 issues are not related to this.
onno Posted July 31, 2016 Author Posted July 31, 2016 @Igor, thanks for the info! @zador: Thanks for the insight. That's what I am hoping as well; I didn't have problems with an older Intel SSD, so I hope I can avoid this mess with another, better SSD. I still wonder whether there is a way to avoid the power cycling during reboot? Or is the A20 reset line always resetting all GPIOs as well?
zador.blood.stained Posted July 31, 2016 Posted July 31, 2016 I still wonder whether there is a way to avoid the power cycling during reboot? Or is the A20 reset line always resetting all GPIOs as well? I don't think it is possible without hardware modifications. Resetting SoC should reset all GPIO ports to default values.
tkaiser Posted July 31, 2016 Posted July 31, 2016 - Add logic to the upstart configuration to unload and reload the ahci_sunxi module, with a 1s sleep in-between. This will also power-cycle the SSD. Power-cycling the connected disk on a Cubietruck should also be possible with sunxi-pio -m PH12'<default><default<default><0>' # off sunxi-pio -m PH12'<default><default<default><1>' # on See last post here and please report back whether that works too.
arox Posted July 31, 2016 Posted July 31, 2016 By the way, if you have a look at the specs, an SSD need 5W on active state !
tkaiser Posted July 31, 2016 Posted July 31, 2016 By the way, if you have a look at the specs, an SSD need 5W on active state ! Pardon? This absolutely depends on the SSD in question, there exist some that consume energy like hell and others that don't. Also some SSDs allow to adjust performance vs. consumption and some enter low power states after some time of inactivity (check 'hdparm -B/-C') and it also depends on the capacity when speaking not about idle/slumber consumption but while being active (higher capacity --> more consumption). http://www.anandtech.com/show/8747/samsung-ssd-850-evo-review/10
arox Posted July 31, 2016 Posted July 31, 2016 Well, (bad) SSD (may) need 5W on active state. (what about init, journal replay or fsck). When running on battery, the idle state consumption is what import. But one have to be cautious about how it is powered. I power mine directly (not threw the card).
onno Posted August 1, 2016 Author Posted August 1, 2016 Hey, Power-cycling the connected disk on a Cubietruck should also be possible with sunxi-pio -m PH12'<default><default<default><0>' # off sunxi-pio -m PH12'<default><default<default><1>' # on See last post here and please report back whether that works too. Perfect! Thanks a lot. I was not aware of sunxi-pio. After unloading ahci_sunxi.ko, I tried to do "echo 236 > /sys/class/gpio/export" but I only got: -bash: echo: write error: Device or resource busy So apparently something else in the kernel blocks it? I guess sunxi-pio does direct memory mapped I/O, disregarding all kernel ownership protection of pins? EDIT: I can actually kind of reproduce the boot up problem while the system is running with sunxi-pio. This appears to often get my SSD into a state where it doesn't answer anymore: sunxi-pio -m PH12'<default><default<default><0>'; sleep .2; sunxi-pio -m PH12'<default><default<default><1>' # dmesg|tail [ 727.291134] ata2: limiting SATA link speed to 1.5 Gbps [ 732.287274] ata2: hard resetting link [ 732.609558] ata2: SATA link down (SStatus 0 SControl 310) [ 732.609637] ata2.00: disabled [ 732.609751] ata2: EH complete [ 732.609854] ata2.00: detaching (SCSI 0:0:0:0) [ 732.614017] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 732.614731] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=0x04 driverbyte=0x00 [ 732.614785] sd 0:0:0:0: [sda] Stopping disk [ 732.614995] sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=0x00 Whereas this (longer sleep) seems to (mostly) succeed in getting the SSD to answer: sunxi-pio -m PH12'<default><default<default><0>'; sleep 5; sunxi-pio -m PH12'<default><default<default><1>' # dmesg|tail [ 863.051345] ata2.00: configured for UDMA/100 [ 863.051430] ata2: EH complete [ 863.053219] scsi 0:0:0:0: Direct-Access ATA CJ325120JC 4.12 PQ: 0 ANSI: 5 [ 863.130357] sd 0:0:0:0: Attached scsi generic sg0 type 0 [ 863.134181] sd 0:0:0:0: [sda] 234441648 512-byte logical blocks: (120 GB/112 GiB) [ 863.135711] sd 0:0:0:0: [sda] Write Protect is off [ 863.135772] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ 863.136212] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 863.146318] sda: sda1 [ 863.155229] sd 0:0:0:0: [sda] Attached SCSI disk So yes, it does look very much like short power drops will get my SSD stuck. The short drop during boot is very probably what got my SSD stuck. So my SSD probably lacks proper brown-out detection circuitry. Time to get a better one ... Now I would be very curious for someone with a good brand SSD to try the sleep .2 variant and post the results
tkaiser Posted August 1, 2016 Posted August 1, 2016 After unloading ahci_sunxi.ko, I tried to do "echo 236 > /sys/class/gpio/export" but I only got: -bash: echo: write error: Device or resource busy Hmm... are GPIOs that are not defined as GPIO (but SATA power in this case) accessible through sysfs at all? I really don't know and will rely on sunxi-pio instead (that can also query pin states and some more stuff)
Recommended Posts