[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#111879: apt-get: wishlist: random download order for better HTTP cache hit rate



Package: apt
Version: 0.5.3
Severity: wishlist
File: /usr/bin/apt-get

I am maintaining a group of 50+ machines running Debian.  These are
a mixture of stable, unstable, and testing machines distributed
geographically in three countries.  All have a local caching HTTP proxy
on their LANs, and they are configured to use it.

At regular intervals 'apt-get update; apt-get -ufy dist-upgrade' is
run on these machines almost simultaneously (it is actually manually
started by local support personnel, who for various unfortunate reasons
do it during the same hour each day).

This has a nasty side-effect: there may be many machines attempting to
download all of the same packages in the same order through the same
proxy if there is a large number of packages upgraded that day.  The HTTP
caches can't have a cache hit on a package until that package is fully
downloaded, so ultimately many machines will end up downloading the same
package at the same time through the same HTTP proxy and Internet feed.

Using rsync to create a local copy of debian.org uses much more bandwidth
than 50 machines simultaneously downloading new libc6 packages, so the
HTTP cache strategy seems to be more efficient even if it wastes 70%
of its bandwidth in the process.

It would be nice if apt-get could randomize the order in which it
attempts to fetch packages (and package lists files, for that matter)
from any given source.  In the absence of a scheme to schedule network
traffic synchronously among many machines running apt-get simultaneously,
randomizing the download order within apt-get would maximize the
probability that any two machines are fetching different packages at
any given time.  This in turn improves the cache hit rate when different
machines fetch the same package, as the first such machine will have had
sufficient time to download the package before the second and subsequent
machines make their requests from the cache.

-- System Information
Debian Release: testing/unstable
Architecture: i386
Kernel: Linux genki 2.2.19-zb-586-smp #1 SMP Wed Mar 28 00:02:08 EST 2001 i686
Locale: LANG=C, LC_CTYPE=C

Versions of packages apt depends on:
ii  libc6                  2.2.4-1           GNU C Library: Shared libraries an
ii  libstdc++2.10-glibc2.2 1:2.95.4-0.010810 The GNU stdc++ library




Reply to: