Monitoring server sensors and triggering actions
-----BEGIN PGP SIGNED MESSAGE-----
At work, after some power related problems,
including long period without power, no-breaks at
their full capacity and air-conditioning not being
covered by in-house power generators, we decided
to configure the servers to automatically shutdown
under critical conditions.
After a few days and some research we are
kind of puzzled. We are using munin and Nagios, so
we know both tools can trigger alerts about
temperature and/or other sensors (like voltages).
Our problem is not related to monitoring
the no-breaks, they are basically to keep short
power outages, the local power generator should
kick-in in less than a minute.
How do you shutdown a server if CPU hits
critical temperatures? Do you use a program? Do
you hack your own shell script?
We couldn't find a tool or service that
would look to sensors and get some parameters,
once limits are crossed a mail to sysadmins are
sent and servers are shutdown for precaution.
One of the things we are avoiding are
network monitoring tools, switches can be off,
so we are looking into something inside each
host, but we could live with a network solution
(if that's the only - or best - option).
Any ideas, hints or comments?
Felipe Augusto van de Wiel (faw)
Debian. Freedom to code. Code to freedom!
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
-----END PGP SIGNATURE-----