Dennboy Posted September 23, 2020 Posted September 23, 2020 Armbianmonitor: http://ix.io/2yuZ Dear maintainers, I have my sensors configured to reboot every night via a user cronjob (0 0 * * * /sbin/reboot), 14 sensors do this without a problem. I've fixed the nanopi neo+2 reboot from NAND some months ago (by re-using friendlyarm first stage u-boot). I just stumbled upon a failed reboot with one of my nanopi neo+2 nodes, after two successful reboots. Looking at the /var/log.hdd/syslog, it got stuck in the shutdown procedure when the watchdog reported a failure. The /var/log.hdd/syslog.1 extracts below show the start of the watchdog, and the stop of the watchdog and its failure. After the failure the system doesn't come up anymore, it needed a powercycle, which is quite inconvenient since it is installed at a hard to access remote location. Sep 18 00:03:26 EnexisVT2-1 systemd[1]: Starting watchdog daemon... Sep 18 00:03:26 EnexisVT2-1 systemd[1]: Reached target Graphical Interface. Sep 18 00:03:26 EnexisVT2-1 systemd[1]: Starting Update UTMP about System Runlevel Changes... Sep 18 00:03:26 EnexisVT2-1 watchdog[2212]: starting daemon (5.15): Sep 18 00:03:26 EnexisVT2-1 watchdog[2212]: int=1s realtime=yes sync=no load=0,0,0 soft=no Sep 18 00:03:26 EnexisVT2-1 watchdog[2212]: memory not checked Sep 18 00:03:26 EnexisVT2-1 watchdog[2212]: ping: no machine to check Sep 18 00:03:26 EnexisVT2-1 watchdog[2212]: file: no file to check Sep 18 00:03:26 EnexisVT2-1 watchdog[2212]: pidfile: no server process to check Sep 18 00:03:26 EnexisVT2-1 watchdog[2212]: interface: no interface to check Sep 18 00:03:26 EnexisVT2-1 watchdog[2212]: temperature: no sensors to check Sep 18 00:03:26 EnexisVT2-1 watchdog[2212]: no test binary files Sep 18 00:03:26 EnexisVT2-1 watchdog[2212]: no repair binary files Sep 18 00:03:26 EnexisVT2-1 watchdog[2212]: error retry time-out = 60 seconds Sep 18 00:03:26 EnexisVT2-1 watchdog[2212]: repair attempts = 1 Sep 18 00:03:26 EnexisVT2-1 watchdog[2212]: alive=[none] heartbeat=[none] to=root no_act=no force=no Sep 18 00:03:26 EnexisVT2-1 systemd[1]: Started watchdog daemon. ... Sep 19 00:00:01 EnexisVT2-1 CRON[6188]: (dennis) CMD (/sbin/reboot) ... Sep 19 00:00:02 EnexisVT2-1 systemd[1]: Stopping Authorization Manager... ... Sep 19 00:00:02 EnexisVT2-1 watchdog[2212]: stopping daemon (5.15) Sep 19 00:00:02 EnexisVT2-1 systemd[1]: Stopping watchdog daemon... ... Sep 19 00:00:02 EnexisVT2-1 systemd[1]: watchdog.service: Control process exited, code=exited, status=1/FAILURE Sep 19 00:00:02 EnexisVT2-1 systemd[1]: watchdog.service: Failed with result 'exit-code'. Sep 19 00:00:02 EnexisVT2-1 systemd[1]: Stopped watchdog daemon. Sep 19 00:00:02 EnexisVT2-1 systemd[1]: watchdog.service: Triggering OnFailure= dependencies. Sep 19 00:00:02 EnexisVT2-1 systemd[1]: Requested transaction contradicts existing jobs: Transaction for wd_keepalive.service/start is destructive (armbian-zram-confi g.service has 'stop' job queued, but 'start' is included in transaction). Sep 19 00:00:02 EnexisVT2-1 systemd[1]: watchdog.service: Failed to enqueue OnFailure= job, ignoring: Transaction for wd_keepalive.service/start is destructive (armbi an-zram-config.service has 'stop' job queued, but 'start' is included in transaction). Sep 19 00:00:02 EnexisVT2-1 systemd[1]: Stopped target Multi-User System. Sep 19 00:00:02 EnexisVT2-1 systemd[1]: Stopping rng-tools.service... Sep 19 00:00:02 EnexisVT2-1 systemd[1]: Stopping OpenBSD Secure Shell server... Sep 19 00:00:02 EnexisVT2-1 systemd[1]: Stopping LSB: Start or stop stunnel 4.x (TLS tunnel for network daemons)... Sep 19 00:00:02 EnexisVT2-1 ntpd[1396]: ntpd exiting on signal 15 (Terminated) ... cold reboot Sep 19 00:00:09 EnexisVT2-1 kernel: [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034] Sep 19 00:00:09 EnexisVT2-1 fake-hwclock[406]: Sat 19 Sep 2020 12:00:03 AM UTC After this the system didn't boot anymore, and we had to manually cold-boot it. So, I've stopped&disabled the watchdog for now, also had to set run_wd_keepalive=0 in /etc/default/watchdog, since the watchdog also failed to stop from the commandline (also on other systems): Sep 23 11:33:59 EnexisVT2-1 systemd[1]: Starting watchdog daemon... Sep 23 11:33:59 EnexisVT2-1 watchdog[3236]: starting daemon (5.15): Sep 23 11:33:59 EnexisVT2-1 watchdog[3236]: int=1s realtime=yes sync=no load=0,0,0 soft=no Sep 23 11:33:59 EnexisVT2-1 watchdog[3236]: memory not checked Sep 23 11:33:59 EnexisVT2-1 watchdog[3236]: ping: no machine to check Sep 23 11:33:59 EnexisVT2-1 watchdog[3236]: file: no file to check Sep 23 11:33:59 EnexisVT2-1 watchdog[3236]: pidfile: no server process to check Sep 23 11:33:59 EnexisVT2-1 watchdog[3236]: interface: no interface to check Sep 23 11:33:59 EnexisVT2-1 watchdog[3236]: temperature: no sensors to check Sep 23 11:33:59 EnexisVT2-1 watchdog[3236]: no test binary files Sep 23 11:33:59 EnexisVT2-1 watchdog[3236]: no repair binary files Sep 23 11:33:59 EnexisVT2-1 watchdog[3236]: error retry time-out = 60 seconds Sep 23 11:33:59 EnexisVT2-1 watchdog[3236]: repair attempts = 1 Sep 23 11:33:59 EnexisVT2-1 watchdog[3236]: alive=[none] heartbeat=[none] to=root no_act=no force=no Sep 23 11:33:59 EnexisVT2-1 systemd[1]: Started watchdog daemon. ... Sep 23 11:34:03 EnexisVT2-1 watchdog[3236]: stopping daemon (5.15) Sep 23 11:34:03 EnexisVT2-1 systemd[1]: Stopping watchdog daemon... Sep 23 11:34:03 EnexisVT2-1 systemd[1]: watchdog.service: Control process exited, code=exited, status=1/FAILURE Sep 23 11:34:03 EnexisVT2-1 systemd[1]: watchdog.service: Failed with result 'exit-code'. Sep 23 11:34:03 EnexisVT2-1 systemd[1]: Stopped watchdog daemon. Sep 23 11:34:03 EnexisVT2-1 systemd[1]: watchdog.service: Triggering OnFailure= dependencies. Note that I froze the armbian upgrades on all these sensors on armbian 20.02.7, to avoid having to recompile my kernel modules on every upstream update. I noticed that the systemd package got an update recently, unsure if this update may mitigate the problem. systemd-sysv/stable 241-7~deb10u4 arm64 [upgradable from: 241-7~deb10u3] systemd/stable 241-7~deb10u4 arm64 [upgradable from: 241-7~deb10u3] dennis@EnexisVT2-1:~$ dpkg -l "*current*" Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-========================================-============-============-============================================================ ii linux-buster-root-current-nanopineoplus2 20.02.1 arm64 Armbian tweaks for buster on nanopineoplus2 (current branch) hi linux-dtb-current-sunxi64 20.02.7 arm64 Linux DTB, version 5.4.28-sunxi64 hi linux-headers-current-sunxi64 20.02.7 arm64 Linux kernel headers for 5.4.28-sunxi64 on arm64 hi linux-image-current-sunxi64 20.02.7 arm64 Linux kernel, version 5.4.28-sunxi64 hi linux-u-boot-nanopineoplus2-current 20.02.1 arm64 Uboot loader 2019.10
Recommended Posts