[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#372712: as in debdelta



A Mennucc <debdev@mennucci.sns.it> writes:

> hi
>
> I have the same problem with debdelta, so I want to share some
> ideas
>
> introduction: the command "debdelta-upgrade" downloads deltas , and
> then applies them to create all .deb necessary for a 
> 'apt-get upgrade' ; so it is doing more or less the same
>
> my advices are
>
> 1) "debdelta-upgrade"  has two threads; one thread 
> downloads the patches and queues them for a second thread, that
> applies them; you may do the same with pdiffs
> (actually, I see some people saying that patchin is
> done in parallel; in case, ignore the above)

pdiff files are currently fetched in a very bad way.

1. fetch index
2. fetch first diff
3. gunzip
4. rred diff and start downloading the next diff (goto 2)

This means that diffs downloads are not properly streamlined and take
up valuable time. rred is also awfully slow and run for every diff.

> 2) anyway , it may happen that, in some unfortunate cases, the time of
> downloading/patching is so high that it would be more efficient to
> just download the new .deb
>
> my solution (that I am implementing in "debdelta-upgrade" ) is
> to keep a statistic of the downloading speed (serverwise)
> and patching speed, and then do the math and decide
>
> you may do the same ; each time apt-get update is invoked, look at how
> many patches are needed, and do a simple computation to decide if it
> is faster to download all patches or to download the new Packages.bz2
>
> 3) the current structure of pdiffs AFAIK is not optimal ;
>  there is a better "binary" structure of diffs that works better, that is:
>
> time complexity for end user:
>  if N (< 128) is the number of changes that occured to Packages file
>  since the last update, the user needs to download and apply
>   ~ 2 (1 + log_2 (N))   diffs
>
> disk complexity on server:
>  it will use more space on servers; in this proposed version,
>  the servers need to store ~ twice the diffs

Think about this scheme:

You start off with Packages.updates being the Packages file. New
version infos of packages are prepended to the Packages.updates file
and the old version info is not (completly) removed. Fields that
remain the same can be omitted ion the new info and fields that
disapeared are included with empty contents. Fields in the old info
that are replaced by the new info can be omitted except for the
MD5sum. Old info that has become too short (e.g. only 3-4 fileds left)
can be merged into one of the newer infos. An entry of "Package: -foo"
signals the removal of the package completly.

Updating the Packages file then means fetching the new
Packages.updates file up to the first MD5sum entry that is already
known locally. At that point you have all the changes to update the
Packages file (or the whole file). If downloading, unpacking and
parsing is pipelined you can stop the downloading and unpacking at
that point. There is some wasted download especially with bzip2s
chunkyness but overall that should be minimal.

Note that this needs only one update file for the Packages file for
any number of updates and the filesize can be a fixed amount of
overhead compared to the plain file. The Packages file could also be
replaced completly by this format over time leaving only a
configurable fixed amount of extra space.

MfG
        Goswin



Reply to: