[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Can we build a proper email cluster? (was: Re: Why is debian.org email so unreliable?)



On Sat, Oct 16, 2004 at 09:29:32PM +1000, Russell Coker wrote:
> On Fri, 15 Oct 2004 23:33, Arnt Karlsen <arnt@c2i.net> wrote:
> > > On Fri, 15 Oct 2004 03:19, Arnt Karlsen <arnt@c2i.net> wrote:
> > > > > Increasing the number of machines increases the probability of one
> > > > > machine failing for any given time period.  Also it makes it more
> > > > > difficult to debug problems as you can't always be certain of
> > > > > which machine was involved.
> > > >
> > > > ..very true, even for aero engines.  The reason the airlines like
> > > > 2, 3 or even 4 rather than one jet.
> > >
> > > You seem to have entirely misunderstood what I wrote.
> >
> > ..really?   Compare with your average automobile accident and
> > see who has the more adequate safety philosophy.
> 
> If one machine has a probability of failure of 0.1 over a particular time 
> period then the probability of at least one machine failing if there are two 
> servers in the cluster over that same time period is 1-0.9*0.9 == 0.19.

But do we really care about whether a "machine" fails? I'd rather say
that what we want to minimize is the _service_ downtime.

With one machine, the possibility of the service being unavailable is
0.1. With two machines it's equal to the possibility of both machines
failing at the same time, so it's 0.1*0.1 == 0.01, as long as the
possibilites are independent (not sure if that's the right translation
of the term).

Or am I wrong in the first sentence?

Otherwise, I'd say that the increase of availability is worth the
additional debugging effort :-)

Marcin
-- 
Marcin Owsiany <porridge@debian.org>             http://marcin.owsiany.pl/
GnuPG: 1024D/60F41216  FE67 DA2D 0ACA FC5E 3F75  D6F6 3A0D 8AA0 60F4 1216



Reply to: