[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: default MTA



]] Russ Allbery 

> Basically, what we're looking for here is the equivalent of a check engine
> light (except, of course, with better user-visible diagnostics available).
> That's what the end user actually wants: something clear and visible
> indicating that something is wrong, which they can drill down and see the
> details and dismiss the error condition if they want, or have all the
> details available to consult someone who knows more about computers if
> they don't know what to do with it themselves.  Historically, root cron
> mail has been exactly that, and that's still a great way of handling it
> for servers, since that mail can be sent off somewhere centrally, analyzed
> and assigned to sysadmins, used to open internal trouble tickets, etc.

I don't think it's a good way at all, since far too often, cron mails
aren't actionable.  I'll get a mail from some automated process that
tried to run apt-get update and that failed (during the middle of the
night).  Since that process runs every hour, it'll have succeeded
afterwards, and there's nothing I can do about the mail.

I wish we had a better system where some, but not all errors would latch
and need acknowledgment, there would be correlation (between hosts and
between messages, so if the router's down, you get a message about data
centre A not being able to successfully complete $process, rather than a
zillion individual messages), there would be merging of identical
messages, so I get a message about $process being broken for the last
$time period (or having a failure rate above $threshold), rather than a
thousand mails because of some error.

Oh, and a pony.  Don't forget the pony.  Or an otter, I like otters.

-- 
Tollef Fog Heen
UNIX is user friendly, it's just picky about who its friends are


Reply to: