Dennboy Posted September 23, 2020 Share Posted September 23, 2020 Armbianmonitor: http://ix.io/2yuZ Dear maintainers, I have my sensors configured to reboot every night via a user cronjob (0 0 * * * /sbin/reboot), 14 sensors do this without a problem. I've fixed the nanopi neo+2 reboot from NAND some months ago (by re-using friendlyarm first stage u-boot). I just stumbled upon a failed reboot with one of my nanopi neo+2 nodes, after two successful reboots. Looking at the /var/log.hdd/syslog, it got stuck in the shutdown procedure when the watchdog reported a failure. The /var/log.hdd/syslog.1 extracts below show the start of the watchdog, and the stop of the watchdog and its failure. After the failure the system doesn't come up anymore, it needed a powercycle, which is quite inconvenient since it is installed at a hard to access remote location. Sep 18 00:03:26 EnexisVT2-1 systemd[1]: Starting watchdog daemon... Sep 18 00:03:26 EnexisVT2-1 systemd[1]: Reached target Graphical Interface. Sep 18 00:03:26 EnexisVT2-1 systemd[1]: Starting Update UTMP about System Runlevel Changes... Sep 18 00:03:26 EnexisVT2-1 watchdog[2212]: starting daemon (5.15): Sep 18 00:03:26 EnexisVT2-1 watchdog[2212]: int=1s realtime=yes sync=no load=0,0,0 soft=no Sep 18 00:03:26 EnexisVT2-1 watchdog[2212]: memory not checked Sep 18 00:03:26 EnexisVT2-1 watchdog[2212]: ping: no machine to check Sep 18 00:03:26 EnexisVT2-1 watchdog[2212]: file: no file to check Sep 18 00:03:26 EnexisVT2-1 watchdog[2212]: pidfile: no server process to check Sep 18 00:03:26 EnexisVT2-1 watchdog[2212]: interface: no interface to check Sep 18 00:03:26 EnexisVT2-1 watchdog[2212]: temperature: no sensors to check Sep 18 00:03:26 EnexisVT2-1 watchdog[2212]: no test binary files Sep 18 00:03:26 EnexisVT2-1 watchdog[2212]: no repair binary files Sep 18 00:03:26 EnexisVT2-1 watchdog[2212]: error retry time-out = 60 seconds Sep 18 00:03:26 EnexisVT2-1 watchdog[2212]: repair attempts = 1 Sep 18 00:03:26 EnexisVT2-1 watchdog[2212]: alive=[none] heartbeat=[none] to=root no_act=no force=no Sep 18 00:03:26 EnexisVT2-1 systemd[1]: Started watchdog daemon. ... Sep 19 00:00:01 EnexisVT2-1 CRON[6188]: (dennis) CMD (/sbin/reboot) ... Sep 19 00:00:02 EnexisVT2-1 systemd[1]: Stopping Authorization Manager... ... Sep 19 00:00:02 EnexisVT2-1 watchdog[2212]: stopping daemon (5.15) Sep 19 00:00:02 EnexisVT2-1 systemd[1]: Stopping watchdog daemon... ... Sep 19 00:00:02 EnexisVT2-1 systemd[1]: watchdog.service: Control process exited, code=exited, status=1/FAILURE Sep 19 00:00:02 EnexisVT2-1 systemd[1]: watchdog.service: Failed with result 'exit-code'. Sep 19 00:00:02 EnexisVT2-1 systemd[1]: Stopped watchdog daemon. Sep 19 00:00:02 EnexisVT2-1 systemd[1]: watchdog.service: Triggering OnFailure= dependencies. Sep 19 00:00:02 EnexisVT2-1 systemd[1]: Requested transaction contradicts existing jobs: Transaction for wd_keepalive.service/start is destructive (armbian-zram-confi g.service has 'stop' job queued, but 'start' is included in transaction). Sep 19 00:00:02 EnexisVT2-1 systemd[1]: watchdog.service: Failed to enqueue OnFailure= job, ignoring: Transaction for wd_keepalive.service/start is destructive (armbi an-zram-config.service has 'stop' job queued, but 'start' is included in transaction). Sep 19 00:00:02 EnexisVT2-1 systemd[1]: Stopped target Multi-User System. Sep 19 00:00:02 EnexisVT2-1 systemd[1]: Stopping rng-tools.service... Sep 19 00:00:02 EnexisVT2-1 systemd[1]: Stopping OpenBSD Secure Shell server... Sep 19 00:00:02 EnexisVT2-1 systemd[1]: Stopping LSB: Start or stop stunnel 4.x (TLS tunnel for network daemons)... Sep 19 00:00:02 EnexisVT2-1 ntpd[1396]: ntpd exiting on signal 15 (Terminated) ... cold reboot Sep 19 00:00:09 EnexisVT2-1 kernel: [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034] Sep 19 00:00:09 EnexisVT2-1 fake-hwclock[406]: Sat 19 Sep 2020 12:00:03 AM UTC After this the system didn't boot anymore, and we had to manually cold-boot it. So, I've stopped&disabled the watchdog for now, also had to set run_wd_keepalive=0 in /etc/default/watchdog, since the watchdog also failed to stop from the commandline (also on other systems): Sep 23 11:33:59 EnexisVT2-1 systemd[1]: Starting watchdog daemon... Sep 23 11:33:59 EnexisVT2-1 watchdog[3236]: starting daemon (5.15): Sep 23 11:33:59 EnexisVT2-1 watchdog[3236]: int=1s realtime=yes sync=no load=0,0,0 soft=no Sep 23 11:33:59 EnexisVT2-1 watchdog[3236]: memory not checked Sep 23 11:33:59 EnexisVT2-1 watchdog[3236]: ping: no machine to check Sep 23 11:33:59 EnexisVT2-1 watchdog[3236]: file: no file to check Sep 23 11:33:59 EnexisVT2-1 watchdog[3236]: pidfile: no server process to check Sep 23 11:33:59 EnexisVT2-1 watchdog[3236]: interface: no interface to check Sep 23 11:33:59 EnexisVT2-1 watchdog[3236]: temperature: no sensors to check Sep 23 11:33:59 EnexisVT2-1 watchdog[3236]: no test binary files Sep 23 11:33:59 EnexisVT2-1 watchdog[3236]: no repair binary files Sep 23 11:33:59 EnexisVT2-1 watchdog[3236]: error retry time-out = 60 seconds Sep 23 11:33:59 EnexisVT2-1 watchdog[3236]: repair attempts = 1 Sep 23 11:33:59 EnexisVT2-1 watchdog[3236]: alive=[none] heartbeat=[none] to=root no_act=no force=no Sep 23 11:33:59 EnexisVT2-1 systemd[1]: Started watchdog daemon. ... Sep 23 11:34:03 EnexisVT2-1 watchdog[3236]: stopping daemon (5.15) Sep 23 11:34:03 EnexisVT2-1 systemd[1]: Stopping watchdog daemon... Sep 23 11:34:03 EnexisVT2-1 systemd[1]: watchdog.service: Control process exited, code=exited, status=1/FAILURE Sep 23 11:34:03 EnexisVT2-1 systemd[1]: watchdog.service: Failed with result 'exit-code'. Sep 23 11:34:03 EnexisVT2-1 systemd[1]: Stopped watchdog daemon. Sep 23 11:34:03 EnexisVT2-1 systemd[1]: watchdog.service: Triggering OnFailure= dependencies. Note that I froze the armbian upgrades on all these sensors on armbian 20.02.7, to avoid having to recompile my kernel modules on every upstream update. I noticed that the systemd package got an update recently, unsure if this update may mitigate the problem. systemd-sysv/stable 241-7~deb10u4 arm64 [upgradable from: 241-7~deb10u3] systemd/stable 241-7~deb10u4 arm64 [upgradable from: 241-7~deb10u3] dennis@EnexisVT2-1:~$ dpkg -l "*current*" Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-========================================-============-============-============================================================ ii linux-buster-root-current-nanopineoplus2 20.02.1 arm64 Armbian tweaks for buster on nanopineoplus2 (current branch) hi linux-dtb-current-sunxi64 20.02.7 arm64 Linux DTB, version 5.4.28-sunxi64 hi linux-headers-current-sunxi64 20.02.7 arm64 Linux kernel headers for 5.4.28-sunxi64 on arm64 hi linux-image-current-sunxi64 20.02.7 arm64 Linux kernel, version 5.4.28-sunxi64 hi linux-u-boot-nanopineoplus2-current 20.02.1 arm64 Uboot loader 2019.10 0 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.