Re: alioth is down (again)
On Mon, Jan 30, 2012 at 3:40 AM, Poison Bit wrote:
> approach one) Run a public nagios, monit, whatever, configured
> with templates to notify to this list on defined events (i.e. more
> than 10 minutes down? the service, the DNS, the whole machine, the
> whole network? is service recovered again?
I don't think it would be appropriate to notify d-d-a or d-i-a on
every service flap. Servers are already monitored:
http://dsa.debian.org/
https://nagios.debian.org/nagios3/
http://munin.debian.org/
> approach two) Search across available opensource monitoring
> systems, some than can run some "status.debian.org", so instead of
> emails, users having an issue can lookup such dashboard, and see
> present and past status or issues.
http://dsa.debian.org/
https://nagios.debian.org/nagios3/
http://munin.debian.org/
> approach three) Write a fast and furious bash/perl/python script
> (can be cool to just use priority >= standard or as few depends as
> possible), that takes a debian.org/infrastructure.yaml file (or .json
> or .txt or xml or ...) that defines Debian machines and services...
> the CLI client runs against such file (so it diagnoses that network
> connection to d.o is ok in first instance) and prints a report of
> unreachable services... (one run, one check. So no too much overload
> unless lot of users synchronize a DoS, that can be done with or
> without this tool).
I guess DSA would welcome a patch adding machine-parsable output and
status information to this:
https://db.debian.org/machines.cgi
I guess the devscripts maintainers would also welcome a script to read
the resulting info and print it out.
> approach four) Search or write a distributed monitoring service,
> that provides the "one" or "two" approaches, but from different
> geolocalized places, so after detect that a service/machine is down
> "from here", it tries to communicate with other continents monitoring
> systems and contrast results before "validate" the issue.
Sounds like something that would be doable with nagios, I suggest you
send a patch for DSA's puppet configuration when alioth returns:
git://anonscm.debian.org/mirror/dsa-puppet.git (currently down due to
alioth being down)
http://dsa.debian.org/howto/puppet-setup/
--
bye,
pabs
http://wiki.debian.org/PaulWise
Reply to: