[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: apt-get improvement idea



Hi,

2010/6/24 Mohammad Ebrahim Mohammadi Panah <ebrahim@mohammadi.ir>:
> I've got an idea for apt-get, which I discussed in #debian-devel. I want to
> know what you think about it. Also I need your guidance to implement that.
> This is the IRC discussion log:

strange, i can be online at whatever time, but it seems i am never
around if something is discussed which is at least relevant for me…

Anyway: (reordered quotes and answers)

> [16:56] <ol> does it currently always select the first?

Yes it does - or, it starts with the one listed at first in the sources.list.
If the download of the package fails it will use the next one (if available).
If that didn't work it would be a bug - it used to be working in the past.


> [16:55] <ebrahim> Feasibility: It is currently possible to add some
> similar repo's to sources.list. Also APT knows how to download concurrently
> from different servers. I just need to tell APT not to always select the first
> source in case of more than one source for that version.

Is it really that simple? How do you know the next mirror to try if
the last one failed for example?
Your current implementation has the lovely effect that if Mr. Random
chooses to choose x times the same mirror APT will give up on x+1…
Also, rand() is not a round-robin implementation…


> [17:03] <ebrahim> I chose to connect to more servers rather than having
> more connections to the same server, for the sake of Debian mirror servers! :)

You have only one connection open to one mirror server at a given time.
APT doesn't open a connection for each single package - it does sent
a request for each package over this connection (see pipelining).
So what you "save" here is "only" time - for now. (see the next one)


> [17:02] <ebrahim> ol, download acceleration through more TCP
> connections is a well-know technique. It is not just me.

And most of the time a stupid one as your downloader hammers
the server with multiple requests to have more chances to be served
in the round robin process. If all users would do that you would gain
nothing expect a time penalty… and maybe less mirrors as not every
mirror hosting free software has the soul propose in hosting it, but
servers a different propose in general but has some free resources…

Splitting across multiple servers can have good effects (e.g. bittorrent)
but also increases the overall flow of data which need to be transferred.
It does help nothing if you query 5 fast servers for different packages
if your connection can only handle the data flowing in from one…
(bittorrent is different as the nodes have in general not the same
good uplink as a "normal" server normally)

I guess the overall speed could be better improved by choosing a
(maybe local and maybe less known) mirror for the user automatically
based on some intelligent heuristic rather than executing an
 "apt-get ddos" command on a few well known…


> [17:31] <mrvn> ebrahim: If you patch it then please add an priority option
> (as in   deb [pri=<N>] url suite component). Make it round-robin only
> between sources with equal priority and default to the line number (or
> something) so the old behaviour remains.

Use case? Why someone should want to prefer the download of the SAME
version from a trusted mirror instead of another trusted mirror.
If the versions were different it is different obviously, but in this case the
option to choose between the two for downloading doesn't exist
in the first place…


Not your fault as it is currently a bit confusing, but have a look at the
experimental repository as the current experimental releases are based
on that one. It includes also a draft implementation of the mirror-protocol
which you might find interesting…
http://bzr.debian.org/apt/apt/debian-experimental-ma/
And maybe have a look at "bzr send".


Oh and btw, i don't want to sound like a babbitt but i don't see in the log
that you asked the participators for their permission to publish the log.
It is questionable if an IRC channel like #d-d isn't already public enough,
but in general the content of an IRC channel is volatile and limited to the
audience in the channel at that time - while a mailinglist archive is open
for everyone to read also in twenty years from now on.

And as a second btw: A more precise title would be fabulous next time…


Best regards,

David Kalnischkies


Reply to: