[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: watchdog



> 1) Is it really the desired behavior that wd_keepalive is started in 
> /etc/init.d/watchdog when the watchdog daemon is stopped? If the system shall 

Yes.

> be kept from rebooting due to terminating the watchdog process, does it not 
> suffice to close /dev/watchdog as it is documented in the manual page? It 
> makes sense if the kernel is compiled with CONFIG_WATCHDOG_NOWAYOUT but 
> otherwise it does not. (The capabilities could be queried with the 
> WDIOC_GETSUPPORT ioctl AFAIK.)

Why? Sorry, I'm not sure I actually understand what you're saying. wd_keepalive
is started to still have basic watchdog functionality without the additional
checks performed by the watchdog daemon.

> From my point of view, when the system administrator explicitely sets 
> CONFIG_WATCHDOG_NOWAYOUT or provides "nowayout" to the kernel module, he/she 
> wants the system to reboot if something happens, including an accidental or 
> intentional stop of the watchdog daemon.

Right, in this case wd_keepalice is not started so that should work.
wd_keepalive is only started if watchdog is stopped by using the init script
which seems to be intentional to me.

> 2) The way the watchdog package currently works, it will not always reboot an 
> unresponsive system. This is related to my comment on bug #499796. For 
> example, when the system enters rc6 and watchdog is terminated by the init 
> script, wd_keepalive will seemingly keep the system from rebooting even if the 
> kernel hangs.

No, only if the kernel does not actually hang. In the case you talk about the
kernel does not hang enough to not execute wd_keepalive anymore, so there is
simply no way to figure out that the system needs a reset. If the kernel really
hangs and stops working having started wd_keepalive guarantees a reboot if you
have a hardware watchdog.

> Would't it be better to run the init script (stop watchdog but do not start 
> wd_keepalive) just before calling reboot or halt? That way, the watchdog 
> daemon will be able to trigger a reboot until the last moment. Unfortunately, 
> there are some issues when the monitored event happens (e.g. process is killed 
> in rc6 or hd is unmounted) more than 60s before the watchdog is terminated.

watchdog has to be stopped before the server it monitors get stopped or else it
would trigger some sort of action. wd_keepalive then is started to make sure
the system itself stays under supervision.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at googlemail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL


Reply to: