[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: AMD64 archive move

Stephan Seitz wrote:
> On Wed, May 04, 2005 at 03:31:58PM +0200, Goswin von Brederlow wrote:
> >>ftp://amd64.debian.net/debian work too. Is correct to use it?
> >If you must. Http is prefered by almost everyone as it doesn't require
> >makeing a new connection for every file and is thus faster and
> >firewall friendlier.
> Now I'm confused. I thought http was the protocol making new
> connections for every file, and ftp the protocol using one connection
> for every action until you log out.

HTTP makes a new request (not connection) for each file.  That may
reuse existing connections depending upon the http protocol version in
use for that session.  New ones reuse connections, old ones don't.  I
don't know if apt reuses connections.

However even with opening new connections the apache server will
prefork threads.  This tends to level out the load on the machine and
helps generally with the overall throughput.  In this situation a
preforking server like apache may have better throughput and a lower
overall server load.

Importantly with http the protocol can tell if the file is up to date
and avoid downloading it in that case.  This is a big win.  (I don't
think the ftp interface to apt does this IIRC and I think ftp
downloads the Packages files each time.  I have to admit I am not sure
and I did not check.)  And because there are well behaved http proxies
available (unfortunately along with less well behaved ones) http tends
to have less user problems with access.  One of those "just works"
things.  Also ftp is troublesome for firewall implementors because of
the way it works and so is often blocked.  So http is generally useful
and prefered.

Often people who mirror like rsync.  The rsync protocol has two
personalities.  In one personality it is transfering a new file.  In
that case it just transfers the new file and is similar in performance
to either ftp or http.  In the other it has an existing file to work
with and works hard to only transfer the differences in the file.
With fast servers over slow links this can really speed things up.
But notice I said "works hard"?  If you have many clients pulling
files from a server the clients are all distributed work but the
server is lumped work.  The server in this case can be browned out by
the extra work needed to calculate the xdelta to transfer.  So used
indiscriminately it has gotten a reputation as something that is hard
on servers.  But it depends on how it is used.

More importantly with rsync for mirroring is that the main archive has
continuous activity.  New debs are being added to the pool and old
ones removed.  Using rsync poorly to mirror, such as in the simple
case of just rsync'ing the entire archive, will almost certainly catch
the archive in an inconsistent state.  If you crosscheck your
resulting Packages files with the files you have on disk you will find
problems.  There is a multipass method to pulling new packages first,
Packages files second, then deleting old packages third.  If people do
that then fine but I fear (and hear) that people do not take the
trouble do to it right.  There are several programs such as debmirror,
mirrorer and debian-multimirror which handle these problems for you
and do the right thing.


Attachment: signature.asc
Description: Digital signature

Reply to: