thank you @aprayoga that was helpful.
TL:DR
my problem was, docker was starting BEFORE my mergefs volumes mounted, in one of my containers I am mounting a docker volume. because docker does not see the volume on disk, it "helpfully" creates that volume on disk. now my mergefs cannot mount because there are files created by docker in that location.
Assumptions:
you have a computer connected via USB-SERIAL and you can see the Helios console.
inline code/commands important variables are wrapped with `` - I am used to markdown.
I assume you know that `vim` and `nano` are text editors and you know how to exit `vim`.
How I found out and investigated:
1. I followed the above advice to boot from sd-card (balena etcher + latest stable armbian buster).
2. I fsck emmc = no issues.
3. I mounted the emmc, chrooted in, changed root password. Now I can login past the `Give root password for maintenance (or press Control-D to continue):`
4. I cannot start services, system is dead, `journalctl` does not have any useful info. `/var/log/messages` looks boring.
5. I found some instructions to edit `/boot/armbianEnv.txt` (while still chrooted to emmc install) I bumped up the verbosity to `9` (make sure you only edit verbosity and do not change other things in your armbianEnv.txt).
`vim /boot/armbianEnv.txt`
verbosity=9
bootlogo=false
overlay_prefix=rockchip
rootdev=UUID=b31229b9-40ab-441c-95be-66666
rootfstype=ext4
console=serial
usbstoragequirks=0x2537:0x1066:u,0x2537:0x1068:u
6. I reboot, and I notice this in the boot logs (I am able to see boot logs because I am still connected via USBC SERIAL cable):
[FAILED] Failed to mount /srv/f95ca…b-439d-450e-b700-4444.
See 'systemctl status "srv-f95ca73b\\x2…0\\x2d4444.mount"' for details.
After this the system "hangs" with the message we saw before:
Starting kernel ...
Give root password for maintenance
(or press Control-D to continue):
7. Because I changed the password, I am able to get in to recovery mode on the emmc install.
I do a `cat /etc/fstab` and notice that `f95ca…b-439d-450e-b700-4444` is my mergefs volume.
I do an `ls -alsht /srv/f95ca…b-439d-450e-b700-4444` and I see some directories there, both of these cannot be true, it's either mounted and files or NOT mounted and NO directories.
These directories match up with the docker volume mounts I specified for one of my containers.
8. I do a `systemctl docker stop` `systemctl docker disable` so docker does not do a mess again (for now). I do a `du -hs /srv/f95ca…b-439d-450e-b700-4444` to make sure it is only empty directories created by docker, not my actual data. The output shows only empty dirs, (I am expecting gigabytes). Only AFTER I verified there is no data to lose, I do a `rm -rf /srv/f95ca…b-439d-450e-b700-4444`.
9. Now I need to make mergefs mount the volume BEFORE docker starts. I run `systemctl list-units --type=mount` this shows me ALL THE MOUNTS, for simplicity I am only including the drives we care about.
srv-dev\x2ddisk\x2dby\x2dlabel\x2dsda.mount loaded active mounted /srv/dev-disk-by-label-sda
srv-dev\x2ddisk\x2dby\x2dlabel\x2dsdb.mount loaded active mounted /srv/dev-disk-by-label-sdb
srv-dev\x2ddisk\x2dby\x2dlabel\x2dsdc.mount loaded active mounted /srv/dev-disk-by-label-sdc
srv-dev\x2ddisk\x2dby\x2dlabel\x2dsdd.mount loaded active mounted /srv/dev-disk-by-label-sdd
srv-f95ca73b\x2d439d\x2d450e\x2db700\x2d4444.mount loaded active mounted /srv/f95ca73b-439d-450e-b700-4444
these are the disks and volumes matching up with the failed mount in the boot logs.
10. Now I edit the systemd docker override with `systemctl edit docker` and add this block:
[Unit]
After=srv-dev\x2ddisk\x2dby\x2dlabel\x2dsda.mount srv-dev\x2ddisk\x2dby\x2dlabel\x2dsdb.mount srv-dev\x2ddisk\x2dby\x2dlabel\x2dsdc.mount srv-dev\x2ddisk\x2dby\x2dlabel\x2dsdd.mount srv-f95ca73b\x2d439d\x2d450e\x2db700\x2d4444.mount
I save, I exit nano.
I want to check if systemctl sees my changes... I run this: `systemctl cat docker`
the output shows the override (look at the last three lines, one of them includes my override to wait until mounts are done):
# /lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
BindsTo=containerd.service
After=network-online.target firewalld.service containerd.service
Wants=network-online.target
Requires=docker.socket
[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always
# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3
# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
# Comment TasksMax if your systemd version does not support it.
# Only systemd 226 and above support this option.
TasksMax=infinity
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes
# kill only the docker process, not all processes in the cgroup
KillMode=process
[Install]
WantedBy=multi-user.target
# /etc/systemd/system/docker.service.d/mount-disks-before-docker.conf
[Unit]
After=srv-dev\x2ddisk\x2dby\x2dlabel\x2dsda.mount srv-dev\x2ddisk\x2dby\x2dlabel\x2dsdb.mount srv-dev\x2ddisk\x2dby\x2dlabel\x2dsdc.mount srv-dev\x2ddisk\x2dby\x2dlabel\x2dsdd.mount srv-f95ca73b\x2d439d\x2d450e\x2db700\x2d4444.mount
11. I start docker, I enable the service `systemctl start docker` `systemctl enable docker`. I reboot, and it is all working. My filesystems mount properly BEFORE docker starts, ensuring docker does not create docker volumes because my filesystem is not ready.