edupv Posted January 29, 2018 Posted January 29, 2018 I think 16 seconds is the max. value of the watchdog timeout of H2/H3, However, I cannot set it to 16 seconds. When watchdog-timeout = 16 in /etc/watchdog.conf, the timeout is 11 seconds : /etc/watchdog.conf max-load-1 = 24 watchdog-device = /dev/watchdog realtime = yes priority = 1 watchdog-timeout = 16 interval = 3 root@orangepizero:~# systemctl status watchdog.service ● watchdog.service - watchdog daemon Loaded: loaded (/lib/systemd/system/watchdog.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2018-01-29 11:10:27 HKT; 3min 31s ago Process: 851 ExecStart=/bin/sh -c [ $run_watchdog != 1 ] || exec /usr/sbin/watchdog $watchdog_options (code=exited, status=0/SUCCESS) Process: 835 ExecStartPre=/bin/sh -c [ -z "${watchdog_module}" ] || [ "${watchdog_module}" = "none" ] || /sbin/modprobe $watchdog_module (code=exited, status=0/SUCCESS) Main PID: 855 (watchdog) CGroup: /system.slice/watchdog.service └─855 /usr/sbin/watchdog Jan 29 11:10:27 orangepizero watchdog[855]: int=3s realtime=yes sync=no soft=no mla=24 mem=0 Jan 29 11:10:27 orangepizero watchdog[855]: ping: no machine to check Jan 29 11:10:27 orangepizero watchdog[855]: file: no file to check Jan 29 11:10:27 orangepizero systemd[1]: Started watchdog daemon. Jan 29 11:10:27 orangepizero watchdog[855]: pidfile: no server process to check Jan 29 11:10:27 orangepizero watchdog[855]: interface: no interface to check Jan 29 11:10:27 orangepizero watchdog[855]: temperature: no sensors to check Jan 29 11:10:27 orangepizero watchdog[855]: test=none(0) repair=none(0) alive=/dev/watchdog heartbeat=none to=root no_act=no force=no Jan 29 11:10:27 orangepizero watchdog[855]: watchdog now set to 11 seconds Jan 29 11:10:27 orangepizero watchdog[855]: hardware watchdog identity: sunxi_wdt root@orangepizero:~# When watchdog-timeout = 7 in /etc/watchdog.conf, the timeout becomes 6 seconds : /etc/watchdog.conf max-load-1 = 24 watchdog-device = /dev/watchdog realtime = yes priority = 1 watchdog-timeout = 7 interval = 3 root@orangepizero:~# systemctl status watchdog.service ● watchdog.service - watchdog daemon Loaded: loaded (/lib/systemd/system/watchdog.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2018-01-29 11:16:49 HKT; 1min 12s ago Process: 846 ExecStart=/bin/sh -c [ $run_watchdog != 1 ] || exec /usr/sbin/watchdog $watchdog_options (code=exited, status=0/SUCCESS) Process: 823 ExecStartPre=/bin/sh -c [ -z "${watchdog_module}" ] || [ "${watchdog_module}" = "none" ] || /sbin/modprobe $watchdog_module (code=exited, status=0/SUCCESS) Main PID: 849 (watchdog) CGroup: /system.slice/watchdog.service └─849 /usr/sbin/watchdog Jan 29 11:16:49 orangepizero watchdog[849]: int=3s realtime=yes sync=no soft=no mla=24 mem=0 Jan 29 11:16:49 orangepizero watchdog[849]: ping: no machine to check Jan 29 11:16:49 orangepizero watchdog[849]: file: no file to check Jan 29 11:16:49 orangepizero systemd[1]: Started watchdog daemon. Jan 29 11:16:49 orangepizero watchdog[849]: pidfile: no server process to check Jan 29 11:16:49 orangepizero watchdog[849]: interface: no interface to check Jan 29 11:16:49 orangepizero watchdog[849]: temperature: no sensors to check Jan 29 11:16:49 orangepizero watchdog[849]: test=none(0) repair=none(0) alive=/dev/watchdog heartbeat=none to=root no_act=no force=no Jan 29 11:16:49 orangepizero watchdog[849]: watchdog now set to 6 seconds Jan 29 11:16:49 orangepizero watchdog[849]: hardware watchdog identity: sunxi_wdt root@orangepizero:~# How can I set the watchdog timeout to 16 seconds ? Thanks.
chrisf Posted January 29, 2018 Posted January 29, 2018 Have you tested it? According to this https://github.com/torvalds/linux/blob/master/drivers/watchdog/sunxi_wdt.c#L71 There is no option for 11 seconds, the value 11 (0xB ) maps to 16 seconds.
edupv Posted January 30, 2018 Author Posted January 30, 2018 7 hours ago, chrisf said: Have you tested it? According to this https://github.com/torvalds/linux/blob/master/drivers/watchdog/sunxi_wdt.c#L71 There is no option for 11 seconds, the value 11 (0xB ) maps to 16 seconds. Thanks for your reply. I don't know C language, so I seldom check the source code. After reading your reply, I tested the timeout with a script. Yes, it is really 16 seconds. Thanks again for your help.
rufik Posted September 19, 2018 Posted September 19, 2018 How is your watchdog doing? Does it work reliable over the time?
edupv Posted September 20, 2018 Author Posted September 20, 2018 10 hours ago, rufik said: How is your watchdog doing? Does it work reliable over the time? I think it is working fine. However, watchdog will only function when system hangs, it does not function normally. Therefore, I said "I think".
rufik Posted September 20, 2018 Posted September 20, 2018 My OPI PC (mainline kernel) just freezes from time to time, it respond to ping but I cannot ssh into it (waiting forever for session then disconnects), services does not respond also. It looks like some OOM or similar problems, I cannot check it via serial console because it's remote location.So I thought that watchdog would be nice there, just to reset board in such cases.
edupv Posted September 20, 2018 Author Posted September 20, 2018 42 minutes ago, rufik said: My OPI PC (mainline kernel) just freezes from time to time, it respond to ping but I cannot ssh into it (waiting forever for session then disconnects), services does not respond also. It looks like some OOM or similar problems, I cannot check it via serial console because it's remote location. So I thought that watchdog would be nice there, just to reset board in such cases. If your OPi PC can respond to ping, then it is not freeze and watchdog will not reset it normally. I think you have to check (for example) if the sshd is listening to the correct port/interface, the firewall rules etc. If your OPi PC is not directly connected to the internet, then you should also check the port forwarding rule of your router etc....
rufik Posted September 20, 2018 Posted September 20, 2018 I've already checked - firewall is disabled all the time, because OPI PC is inside my LAN. Nmap show open ports 22, 8123 (HomeAssistant), 3306 (MySQL) and so on. But every service accepts TCP connection and does not respond at all, terminating connection after some timeout. Sshd accepts connection, asks for password and hangs...until timeout. Ping works So it looks like some OS internal problem, maybe with memory and spawning processes/threads? That's why I'd like to try out watchdog.
rufik Posted October 11, 2018 Posted October 11, 2018 I'm just getting error starting watchdog service on OPI2 Ubuntu Bionic 4.14.70 like Cannot open /dev/watchdog (errno = 16 = 'Device or resource busy'). rufik@farmer:~$ sudo systemctl status watchdog ● watchdog.service - watchdog daemon Loaded: loaded (/lib/systemd/system/watchdog.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2018-10-11 10:52:08 CEST; 15s ago Process: 17410 ExecStopPost=/bin/sh -c [ $run_wd_keepalive != 1 ] || false (code=exited, status=1/FAILURE) Process: 17436 ExecStart=/bin/sh -c [ $run_watchdog != 1 ] || exec /usr/sbin/watchdog $watchdog_options (code=exited, status=0/SUCCESS) Process: 17433 ExecStartPre=/bin/sh -c [ -z "${watchdog_module}" ] || [ "${watchdog_module}" = "none" ] || /sbin/modprobe $watchdog_module (code=exited, status= Main PID: 17438 (watchdog) CGroup: /system.slice/watchdog.service └─17438 /usr/sbin/watchdog Oct 11 10:52:08 farmer watchdog[17438]: starting daemon (5.14): Oct 11 10:52:08 farmer watchdog[17438]: int=3s realtime=yes sync=no soft=no mla=0 mem=0 Oct 11 10:52:08 farmer watchdog[17438]: ping: no machine to check Oct 11 10:52:08 farmer watchdog[17438]: file: no file to check Oct 11 10:52:08 farmer watchdog[17438]: pidfile: no server process to check Oct 11 10:52:08 farmer watchdog[17438]: interface: no interface to check Oct 11 10:52:08 farmer watchdog[17438]: temperature: no sensors to check Oct 11 10:52:08 farmer watchdog[17438]: test=none(0) repair=none(0) alive=/dev/watchdog heartbeat=none to=root no_act=no force=no Oct 11 10:52:08 farmer watchdog[17438]: cannot open /dev/watchdog (errno = 16 = 'Device or resource busy') Oct 11 10:52:08 farmer systemd[1]: Started watchdog daemon. But /dev/watchdog seems not to be opened: rufik@farmer:~$ sudo fuser -v /dev/watchdog rufik@farmer:~$ sudo lsof /dev/watchdog I have disabled wd_keepalive deamon - is it really required to run? Or just excludes with watchdog daemon?
Recommended Posts