[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: watchdog



Thanks for the reply.

> Why? Sorry, I'm not sure I actually understand what you're saying. 
wd_keepalive
> is started to still have basic watchdog functionality without the additional
> checks performed by the watchdog daemon.

Does it actually perform some kind of checks? What I got from the 
documentation is that it only writes to /dev/watchdog periodically regardless 
what happens. Thus "basic watchdog functionality" would only mean that it is 
checked if the userspace process is still running.

> No, only if the kernel does not actually hang. In the case you talk about
> the kernel does not hang enough to not execute wd_keepalive anymore, so
> there is simply no way to figure out that the system needs a reset. If the
> kernel really hangs and stops working having started wd_keepalive
> guarantees a reboot if you have a hardware watchdog.

You are right. I did not actually mean that the kernel hangs but that there is 
a deadlock like in the other bug report: the kernel waits for the nfs server 
to reply but the watchdog does not trigger because at this time the watchdog 
daemon has already been stopped and wd_keepalive started. Therefore the event 
that was monitored (timestamp of a periodically touched file) did not trigger 
a reboot.

> watchdog has to be stopped before the server it monitors get stopped or else
> it would trigger some sort of action. wd_keepalive then is started to make
> sure the system itself stays under supervision.

That's what I assumed: prevent an accidental reboot in rc6 or rc0 (and of 
course when watchdog is stopped by some other means).


Regards,

Bastian


Reply to: