[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Rambling apt-get ideas



On Thu, Jan 04, 2001 at 08:37:26AM -0600, Vince Mulhollon wrote:

> Back on topic, I would have thought that package distribution was a one
> time shot.  Caches are for people who would otherwise download the
> slashdot.org header graphic fifty times a day.  Whereas each individual
> debian machine should only have to download the latest perl .deb once in
> it's "life".  If I apt-get upgrade through my http cache, all I do is flood
> the cache with megs of data I'll never download again.

This would be true if you were only upgrading one system.  If you are upgrading
<n> systems, you need the same .deb <n> times.

> I'm not sure about the overhead is minimal for less than a thousand
> clients.  I have 18 or so debian workstations at work.  If it takes 5
> minutes to transfer all the .debs to upgrade one machine, then I think it
> would take a unicast system slightly less than 18*5 minutes (about 1 1/2
> hours) to upgrade, vs 5 minutes for a multicast system to upgrade.  A
> unicast upgrade could be an "start it and go to lunch" process whereas a
> multicast upgrade would be a "get a cup of coffee" process.  If I had a
> hundred machines to upgrade, the comparison would be even greater.  Yeah,
> wasting 17*5 minutes is not the end of the world, but why not try harder to
> do better?

If all of the systems can keep up with the 5 minute rate, and no packets are
lost, then the multicast method would probably be faster.  What happens,
though, when one of the systems can't keep up?  Perhaps its load goes up due to
some other job, or there is network congestion and a packet is lost?  There
must be additional protocol overhead to handle retransmissions, perhaps even on
a host-by-host basis.  The server must keep track of who has received what, and
tag and order all of the datagrams in order to do this.  Confirmation must be
received from each host about which datagrams it has received correctly.
Before you know it, you are reimplementing TCP in a 1-to-N transmission model,
and that would be very tricky to get right.

Multicast is designed for "best effort" datagram services like UDP, for
situations where loss is not a big problem (e.g. real-time streaming media), or
error correction and retransmission are handled by higher-level protocols.
Downloading of Debian packages requires a reliable transmission stream.

> The concept of the system I'm discussing, is one "master" machine downloads
> the .deb via http.  Then it multicasts the .deb to all the other machines
> at once.  All of them are on the same subnet so some variety of layer 2
> multicast / broadcast would work, although it would be nice to go beyond
> the subnet if necessary.
>
> I agree that the discussion about new installs points out that sometimes,
> "pull" based systems have an advantage.  I'm pointing out that sometimes,
> "push" based systems have an advantage.  And I'm motiviated because I
> believe my situation at work is one of those situations where "push" is the
> better answer.

There are many who would argue that IP multicast is a broken idea.  I am not
one of those, but I do believe that its uses are limited to a relatively small
problem space.  I believe that multicast is not well-suited to this particular
problem, but you are welcome to try it out.

-- 
 - mdz



Reply to: