Jump to content

RockPro64 - Boot fails if fstab has volumes on PCIe SSD


David Pottage

Recommended Posts

Hello,

I have a RockPro64 running Armbian buster version 20.02.1 (Linux kernel 5.4.26-rockchip64)

I have connected an M.2 SSD via an adapter in the PCIe slot, and have setup an LVM PV on it, with a number logical ext4 and btrfs volumes on it.

All this worked fine when I set it up, but when I went to reboot the system, it never came back. On the serial console, I got a series of message from the firmware and U-Boot, then "Starting kernel ..." and nothing. (I waited about half an hour)

 

I have increased the boot verbosity, by increasing the value in /boot/armbianEnv.txt from 1 to 7. If the boot is successful, I see lots of output after a delay of about a minute, but if it gets stuck, I see no output at all. I have also tried enabling earlyprintk using the guide at kernel.org, and as before I saw no output unless the boot was successful. (That guide says that you can only use /dev/ttyS0 or 1 by name, and for any other serial port I would need to find the hardware address by looking in /proc/tty/driver/serial. On the RockPro64, the default serial port is /dev/ttyS2, so I tried both that anyway, and also "0xFF1A0000" from /proc/tty/driver/serial and neither worked.)

 

Suspecting the SSD, I powered off, disconnected it, and rebooted. After a delay of about 2 minutes I got "Give root password, or Ctrl-D to continue" (Not sure of the exact wording), so Iogged in, and edited /etc/fstab to comment out or add "noauto" to the lines for all file systems on the SSD. I was then able to reboot successfully.

Those volumes on the SSD will mount fine from the command line after boot, but it looks like they don't mount successfully during boot, and are preventing boot.

I suspect that there is some sort of issue with dependencies in systemd. Perhaps the PCI bus or the LVM2 mapper is not ready when the kernel attempts to mount the filesystems, but why would that block the boot, rather than just adding a delay?

 

On one occasion, I attempted to boot with just a swap volume on the SSD name in /etc/fstab, and I saw this in syslog:

 

Mar 21 19:52:57 jupiter systemd[1]: dev-jupiter\x2dvg1-swap.device: Job dev-jupiter\x2dvg1-swap.device/start timed out.
Mar 21 19:52:57 jupiter systemd[1]: Timed out waiting for device /dev/jupiter-vg1/swap.
Mar 21 19:52:57 jupiter systemd[1]: Dependency failed for /dev/jupiter-vg1/swap.
Mar 21 19:52:57 jupiter systemd[1]: dev-jupiter\x2dvg1-swap.swap: Job dev-jupiter\x2dvg1-swap.swap/start failed with result 'dependency'.
Mar 21 19:52:57 jupiter systemd[1]: dev-jupiter\x2dvg1-swap.device: Job dev-jupiter\x2dvg1-swap.device/start failed with result 'timeout'.


Does anyone have an idea how I can fix this, or how to investigate further, given that I don't see any output on the serial console or the HDMI monitor when the boot fails?

Thanks, David

 

Edited by David Pottage
Put log lines into a code block.
Link to comment
Share on other sites

Hi, I am trying to investigate a booting issue with PCIe disk mounts, and it is difficult because I am not seeing any error messages during boot.

 

I have already increased the boot verbosity, by increasing the value in /boot/armbianEnv.txt from 1 to 7, but that is not helping much. I looks like the issue happens much earlier in the boot process. When I described my problem on the debian-arm mailing list, one reply suggested that I enable earlyprintk, and gave me a link to the Linux kernel command line docs.

 

I tried following those instructions, but I can't get any more output during boot on the RockPro64 serial port /dev/ttyS2. How do I get that working?

 

Thanks, David.

Link to comment
Share on other sites

To update:

 

The issue appears to be with warm reboots. They don't work if /etc/fstab mentions anything on the PCIe SSD. On the serial console it just gets stuck at "Starting kernel ..." and nothing more.

 

A cold boot from poweroff works fine, and will successfully mount the filesystems on the PCIe SSD. (I think I was confused before, because everytime the boot got stuck, I pulled the power before disconnecting the PCIe, so the subsequent successful boot was cold).

 

I have tried to bypass the issue by by configuring my system to do reboots via kexec [1], but that does not work for some reason. The system does not kexec, and instead goes through the steps of a system reboot which fails.

 

At this point I can only assume that there is something about the PCIe bus or the device on it that is not correctly initialized during a warm boot, but without working early printk or another way to get some logging from the failed boot, I have no more avenues of investigation.

 

On the other hand, I have a fairly simple work around. It would be nice to be able to reboot my system remotely, but I can deal with that.

 

[1] https://wiki.debian.org/BootProcessSpeedup#Using_kexec_for_warm_reboots

 

Link to comment
Share on other sites

I have done some more investigation, and discovered this:

https://blog.printk.io/2015/10/linux-earlyprintkearlycon-support-on-arm/

 

It appears that for ARM, there is a new component for early logginv that makes use of a stdout channel setup by the bootloader. The kernel command line parameter to turn it on is "earlycon" (without any additional parameters).

 

I tried it with my RockPro64 board, and it got additional output, though sadly not enough to resolve the PCIe SSD mounting problem.

 

I think that it would be useful for others to add earlycon as an option to the /boot/boot.cmd script, so that others can turn it on when required to diagnose issues that they find. I will create a pull request to illustrate my proposed change.

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines