Re: Rambling apt-get ideas

To: debian-devel@lists.debian.org
Subject: Re: Rambling apt-get ideas
From: Matt Zimmerman <mdz@debian.org>
Date: Thu, 28 Dec 2000 18:08:16 -0500
Message-id: <[🔎] 20001228180815.I8666@alcor.net>
Mail-followup-to: debian-devel@lists.debian.org
In-reply-to: <[🔎] OF47B15A93.36D553DE-ON862569C3.004CC313@norlight.com>; from vlm@norlight.com on Thu, Dec 28, 2000 at 08:22:52AM -0600
References: <[🔎] OF47B15A93.36D553DE-ON862569C3.004CC313@norlight.com>

On Thu, Dec 28, 2000 at 08:22:52AM -0600, Vince Mulhollon wrote:

> Yes, that was kind of my point.
> 
> An analogy would be that we don't need dpkg because most of its
> functionality could be done by a mixture of tar, gzip, and perl (and maybe
> make to handle dependancies).

Not quite.  dpkg-deb actually does call out to tar and gzip, and lets those
programs do what they do best.  It doesn't try to be tar and gzip and dpkg all
at once.  The UNIX approach is to build tools that do one or a few jobs very
well, and build larger tools out of that code base.  That way, once a problem
is solved, it is solved for all programs that share the problem-solving code.

> My point being, that yes I already use squid as a proxy server for a whole
> network of apt-geting debian boxes and after only a little work it works
> OK, but something using IP multicast would be better due to lower network
> utilization.  True, doing multiple simultaneous upgrades means eventually
> an upgrade would kill all the machines simultaneously, and my high end
> pentiums are going to decompress the gzip parts much faster than my old
> 386s, although there are probably ways around that, just because all the
> .debs are distributed all at once in one multicast burst doesn't mean they
> have to be installed all at once.  Anyway, squid does not do IP multicast
> to multiple simultaneous clients, last time I checked.  Another cool
> ability of an integrated cache would be that the "fetching" machine could
> maintain a list of all the machines it pushed the new .deb to, and when all
> the "client" machines have a copy of the new .deb, clear it from the cache.
> With a squid solution, squid has to guess if its OK to clear the cached
> .deb based upon access time, size, etc.  Even worse, my squid only caches
> files less than 8 megs, thus each machine downloads its own copy of emacs,
> etc.  A cache for general web use "works", but a cache designed
> specifically for .deb packages would work better.

There is very little tuning you could do for a general-purpose web cache in
order to support .debs that would not be generally applicable to other
situations.  Rather than creating a new caching proxy for .debs, why not
improve squid to do what you want?  That way, other applications (which may or
may not exist yet) can also benefit.  Squid does in fact use multicast, but
only for ICP (and thus only for very small objects).  I think you will find
that IP multicast is not particularly suited to this task.  In order to avoid
overflowing socket buffers on the client, the server would have to multicast
its data only as fast as the slowest client.  Not only does this cause a
performance bottleneck, but it is tricky to detect how fast the client can
receive data, and adjust accordingly.  If the systems are not all on the same
LAN, the server must take into account network congestion, etc.  This is what
protocols like RTSP try to do.  Where real-time content delivery is not an
issue, TCP does a much better job of responding to changing network conditions.
Of course, if all of the systems are on the same LAN, you could use real link
layer broadcast instead of IP multicast.

The issue of maximum object size is a configuration issue.  The ability to be
smarter about particular object types sounds like a good idea for a squid
enhancement.

> I suppose I could configure my two dozen workstations at work "all at once"
> "remotely" by doing some kind of weird hack with expect and ssh.  But it
> might be cooler to do that with directly with debconf, again using IP
> multicast.

Again, this is not a multicast problem.  You could send a very few packets to
each system telling it to update its configuration from a central
meta-configuration.  In fact, cfengine already does exactly that, and I have
used it very successfully in practice.  Once debconf is deployed widely enough
to make it useful for this task, a shared debconf database and some extra
debconf smarts could replace a lot of cfengine rules.  Debconf would only need
some sort of feature (set) to allow the user to command "Check for any changes
to the shared config database since you last checked, and reconfigure the
affected programs".

> Or another example, a network wide shared apt-get cache.  I suppose you
> could just NFS mount all the machines onto one apt-get cache on one
> machine.  There might be file locking issues.  There would be security and
> authentication issues.  The one server would have to have all the disk
> space for the cache.  And it would be a manual PITA to the configure for
> each machine involved.  Would be cooler, cleaner, and more efficient to
> have the system  do the same functionality as a core feature.

On the contrary, I think a network-wide cache (whether a simple shared
filesystem, or a proxy server) maps very well to this problem.  You want to
retrieve a lot of data once, and share it among multiple clients.  Automatic
proxy configuration is an easy task.

> Another example is adding transport protocols to apt-get.  I suppose given
> a strange brew of named pipes, NFS mounts, loopback devices, and "file:"
> lines in /etc/apt/sources.list I could find a way for apt-get to pull .debs
> over freenet, or over FSP, or over DCC chat on IRC.
>
> The general idea of my post is that I do some unusual hacks involving
> apt-get already, and I can think of even stranger and more useful hacks.
> But why make and use a wierd custom hack, when the idea could be cleanly
> built right into the infrastructure instead, for everyone to automatically
> and easily use? (although I don't know enough apt-get to do it myself)

For the same reason that you don't integrate a web browser with the OS kernel.
It is possible to have clean and elegant interoperation of components without
combining the components into one monolithic module.  Better to add a little
bit of functionality to each component and get them to meet in the middle
(using a standard interface or protocol).  Rather than teaching every OS about
every other OS's process space and system call interface, you think up
standards like RPC and TCP/IP and teach them how to speak a common language.

It would be nice to have a standard, anonymous file transfer protocol that
doesn't have the individual problems of FTP or HTTP, neither of which were
designed for all of the uses that they are seeing today.  FSP has some good
ideas, but it doesn't seem to have caught on (UDP may be part of the reason).

-- 
 - mdz

Reply to:

Follow-Ups:
- Re: Rambling apt-get ideas
  - From: Glenn McGrath <bug1@optushome.com.au>
- Re: Rambling apt-get ideas
  - From: Adam Heath <adam@doogie.org>
- Re: Rambling apt-get ideas
  - From: mechanix@digibel.org

References:
- Re: Rambling apt-get ideas
  - From: "Vince Mulhollon" <vlm@norlight.com>

Prev by Date: 'testing' & dep conflicts
Next by Date: Re: Rambling apt-get ideas
Previous by thread: Re: Rambling apt-get ideas
Next by thread: Re: Rambling apt-get ideas
Index(es):
- Date
- Thread