watchdog
Hello,
I hope my general questions about the watchdog package belong on this list.
1) Is it really the desired behavior that wd_keepalive is started in
/etc/init.d/watchdog when the watchdog daemon is stopped? If the system shall
be kept from rebooting due to terminating the watchdog process, does it not
suffice to close /dev/watchdog as it is documented in the manual page? It
makes sense if the kernel is compiled with CONFIG_WATCHDOG_NOWAYOUT but
otherwise it does not. (The capabilities could be queried with the
WDIOC_GETSUPPORT ioctl AFAIK.)
From my point of view, when the system administrator explicitely sets
CONFIG_WATCHDOG_NOWAYOUT or provides "nowayout" to the kernel module, he/she
wants the system to reboot if something happens, including an accidental or
intentional stop of the watchdog daemon.
2) The way the watchdog package currently works, it will not always reboot an
unresponsive system. This is related to my comment on bug #499796. For
example, when the system enters rc6 and watchdog is terminated by the init
script, wd_keepalive will seemingly keep the system from rebooting even if the
kernel hangs.
Would't it be better to run the init script (stop watchdog but do not start
wd_keepalive) just before calling reboot or halt? That way, the watchdog
daemon will be able to trigger a reboot until the last moment. Unfortunately,
there are some issues when the monitored event happens (e.g. process is killed
in rc6 or hd is unmounted) more than 60s before the watchdog is terminated.
Regards,
Bastian
Reply to: