watchdog

To: Debian Developers <debian-devel@lists.debian.org>
Subject: watchdog
From: Bastian Blywis <blywis@zedat.fu-berlin.de>
Date: Thu, 14 Apr 2011 14:47:16 +0200
Message-id: <[🔎] 201104141447.17091.blywis@zedat.fu-berlin.de>
Reply-to: blywis@inf.fu-berlin.de

Hello,

I hope my general questions about the watchdog package belong on this list.

1) Is it really the desired behavior that wd_keepalive is started in 
/etc/init.d/watchdog when the watchdog daemon is stopped? If the system shall 
be kept from rebooting due to terminating the watchdog process, does it not 
suffice to close /dev/watchdog as it is documented in the manual page? It 
makes sense if the kernel is compiled with CONFIG_WATCHDOG_NOWAYOUT but 
otherwise it does not. (The capabilities could be queried with the 
WDIOC_GETSUPPORT ioctl AFAIK.)

From my point of view, when the system administrator explicitely sets 
CONFIG_WATCHDOG_NOWAYOUT or provides "nowayout" to the kernel module, he/she 
wants the system to reboot if something happens, including an accidental or 
intentional stop of the watchdog daemon.

2) The way the watchdog package currently works, it will not always reboot an 
unresponsive system. This is related to my comment on bug #499796. For 
example, when the system enters rc6 and watchdog is terminated by the init 
script, wd_keepalive will seemingly keep the system from rebooting even if the 
kernel hangs.

Would't it be better to run the init script (stop watchdog but do not start 
wd_keepalive) just before calling reboot or halt? That way, the watchdog 
daemon will be able to trigger a reboot until the last moment. Unfortunately, 
there are some issues when the monitored event happens (e.g. process is killed 
in rc6 or hd is unmounted) more than 60s before the watchdog is terminated.

Regards,

Bastian

Reply to:

Follow-Ups:
- Re: watchdog
  - From: Michael Meskes <meskes@debian.org>

Prev by Date: Re: Default size limits for /run (/var/run) and /run/lock (/var/lock)
Next by Date: Re: Default size limits for /run (/var/run) and /run/lock (/var/lock)
Previous by thread: Re: Bug#613209: O: jabberd14 -- Instant messaging server using the Jabber/XMPP protocol
Next by thread: Re: watchdog
Index(es):
- Date
- Thread