Re: apt PARALLELISM
On Mon, 12 Dec 2005, Henning Makholm wrote:
> As far as I read the proposal, it is about downloading _different_
> files from different mirrors - if you have 25 packages to get for your
> 'apt-get update' operation, download 5 packages from each of 5
> different servers, with one connection to each server active at a
> While I cannot see any very common situation where such parallellism
> would be an advantage, it is not clear that it would increase the load
> of any or all servers.
It would be an advantage *to the receiving end* if TCP/IP is the limiting
factor, as it compresses in time the number of connections made, thus more
of them are active at the same time and not interfering with one another.
Depending on how the queues work, even when the ISP backbone is full, trying
for more connections might increase your overall transfer speed *at the
undeniable fact that you will be making it worse for everyone else in the
OTOH, this compresses in time the resources used by a single individual.
Whether this translates to diminished experience for a large group of
individuals (which will also have compressed their resource usage profile in
time) or not, is not such a simple question.
> from each of the 5 servers, but sequentially such that only a single
> connection to a single server is active. And the difference from
> _that_ to an actual parallel implementation is just to shift the
> connections each server experiences a bit in time - the number of KB
> served by each server stays constant.
The bandwidth is constant, yes. The ammount of active connections and the
aggregate flow speed is not, it increases as you have compressed time. I.e.
you use more resources for a shorter time.
THis is not something that would bother anyone if it is a single user... but
if you have 10k users doing that, often close enough in time, well, things
should get MUCH worse as far as I can see. If they are doing this at random
times in the day, OTOH, it would not be that bad, I guess.
> Is your point that a server prefers to push bytes through the
> connection at a constant rate, and starts wasting resources if the
Constant _total_ average flow rates are *always* the best to work with IMHO,
but that was not what I was talking about.
> servers? But when the bottleneck is closest to the client, enabling
> parallel downloads would not make much sense anyway.
They do. I have experienced them, I have a 4Mbit/s cable downlink at the
moment, I can assure you that, unless the ISP is having trouble on the
last-mile feeder (i.e. extreme packet loss, trying to pump more in the wire
just makes things worse), it improves my download speed to have multiple
connections (it doesn't matter if I am transfering the same data or not, I
am talking about the aggregate flow here).
Whether MY [a single individual] increased download speed is worth the extra
load on the mirror network, and whether it WOULD increase the load on the
mirror network is what we are asking here.
(and for the people who can't read whole threads, my position is that we
should never decrease the experience of a group of people to increase the
experience of an individual).
> (Of course, Goswin has a valid point that some people have their
> sources.list deliberately written with a remote, undesirable, server
> at the end as a _fallback_ option. Therefore parallelism should at
> best be an _option_, not something that apt starts doing unbidden).
> >From that experience, a last-mile link in the 1 Mb/s range would still
> seem to be the limiting factor - and therefore people at the end of
> such links would have little use for parallelism in the first place.
That's not how it works when you have shitty backbone connectivity, like in
Brazil. It doesn't matter if they deliver 4Mbit/s to your home, the
network in the middle is crap when compared to what I've seen in the USA and
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot