[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Can we build a proper email cluster? (was: Re: Why is debian.org email so unreliable?)



On Sat, 16 Oct 2004 22:00, Marcin Owsiany <porridge@debian.org> wrote:
> > If one machine has a probability of failure of 0.1 over a particular time
> > period then the probability of at least one machine failing if there are
> > two servers in the cluster over that same time period is 1-0.9*0.9 ==
> > 0.19.
>
> But do we really care about whether a "machine" fails? I'd rather say
> that what we want to minimize is the _service_ downtime.

If someone has to take time out from other work to fix it then we care.  There 
are lots of things that we would like to have done but which are not being 
done due to lack of time.  Do we really want to take more time away from 
other important tasks just to have super-reliable @debian.org email?

> With one machine, the possibility of the service being unavailable is
> 0.1. With two machines it's equal to the possibility of both machines
> failing at the same time, so it's 0.1*0.1 == 0.01, as long as the
> possibilites are independent (not sure if that's the right translation
> of the term).

Correct.  Configuration errors and software bugs can put two machines offline 
just as easily as one.

> Otherwise, I'd say that the increase of availability is worth the
> additional debugging effort :-)

Are you going to be involved in doing the work?

This entire thread started because the admin team doesn't seem to have enough 
time to do all the work that people would like them to do.  Your suggestion 
seems likely to make things worse not better.

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/    Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page



Reply to: