[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: apt repository usage



"Brian Eaton" <eaton.lists@gmail.com> writes:

> Hello -
>
> I'm interested in doing some log file analysis on apt repositories. 
> Anybody willing to share logs?
>
> More specifically, I'd like to try to figure out what percentage of
> apt repository bandwidth goes to upgrading packages versus installing
> brand new packages.  If a lot of bandwidth is being used for upgrading
> packages, maybe there is some way to make the process more efficient. 
> I've stumbled across some threads discussing using rsync as an apt
> backend, for example:
>
> http://lists.debian.org/debian-devel/2003/07/msg00462.html
>
> The main objection to using rsync seems to be that it increases the
> CPU usage on the file server.  However, full fledged rsync shouldn't
> be necessary.  If you know the two files to be synchronized in
> advance, then you could do the CPU intensive work up front.  You could
> prepare a patch-like binary format that apt clients could then use to
> update an older copy of the package.

No you can't with rsync. rsync generates the checksums on the client
side and the server then runs a block sized window over the file
looking for any matching block. Caching that would require 20 times
the filesize.

You have to reverse roles to get block checksums precached and that is
what zsync does.


Another thing that has been suggested is to provide patch packages
that only contain the differences between two versions of a
package. You would generate them once on ftp-master or something and
get apt-get to pick either a full package or the patch package
depending on the already installed version. But it looks like you have
to change quite a bit in apt, aptitude and dpkg for this as well as
figure out how exactly to build those patch debs.

> I'm not certain whether this is worth doing, which is why I'm
> interested in having a look at apt repository usage.  I realize there
> may be some privacy concerns about sharing the log files.  I hope we
> can work something out to remove personally identifiable information
> from the logs while still leaving the important data intact.

> Regards,
> Brian

MfG
        Goswin



Reply to: