[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#372712: apt: periodically roll up pdiffs



Matt Taggart <taggart@debian.org> writes:

> I had a similar idea as Andrea Mennucc mentions in #372712 for the problem of 
> so many pdiffs. The idea is similar to a scheme you might use for nightly 
> incremental backups. You might run a "zero" backup once a month, a "one" 
> backup every 15 days, a "two" every 7, a "three" every 3 and a "four" every 
> day". For example:
>
>  July 2006           Aug 2006
>             0        0 4 4 3 2
> 4 4 3 4 4 3 2    4 3 4 4 3 4 2
> 4 4 3 4 4 3 2    3 4 4 1 4 4 2
> 1 3 4 4 3 4 2    4 4 3 4 4 3 2
> 3 4 4 3 4 4 2    4 3 4 4 1
> 4 1
>
>
> On any given day you'd need at most 5 patches and many days far less than 
> that.  The reason for doing this is not just to reduce the number of files, 
> but the overall data, as a lot of the data in the diff is redundant. Consider 
> the case of a package that is updated every day for a month. Under the current 
> scheme a client not updating for that month would need to download the 
> differences for that package 30 times right? Under an incremental scheme the 
> worst case is 5 diffs for that package. It's an even bigger win for longer 
> periods of time, the current scheme will start really falling down once we get 
> a few more months of pdiffs.
>
> Thanks,

But then again why have incremental diffs at all?

2 patches can be merged by using a file with enough uniqe lines, apply
both patches, diff again. No need to work off the actual Packages
file, they don't have to be stored for this.

It is true that for every day the patch files will all grow (- the
packages with multiple updates in that time) but they aren't so big
and compression gets better for larger files.


Given the crawling speed of the rred method downloading more than a
few days (~300k) worth of patches is slower than the full file (3Mb)
even on a slow dsl line. A combined patch would only use one download,
one gunzip and one rred run. I think that would be worth the space
increase for the patch files.

I would recommend to name the combined patch files after the md5sum
(or sha1) of the Packages/Sources file they patch. That way no index
needs to be downloaded.

MfG
        Goswin

-----------------------------------------------------------------------
Sizes for combined patches:

-rw-r--r--  1 reprepro nogroup 26K Jul 27 13:55 comb.2006-07-26-1318.02.gz
-rw-r--r--  1 reprepro nogroup 54K Jul 27 13:55 comb.2006-07-25-1313.19.gz
-rw-r--r--  1 reprepro nogroup 90K Jul 27 13:55 comb.2006-07-24-1338.19.gz
-rw-r--r--  1 reprepro nogroup 132K Jul 27 13:55 comb.2006-07-24-0235.54.gz
-rw-r--r--  1 reprepro nogroup 170K Jul 27 13:55 comb.2006-07-22-1308.51.gz
-rw-r--r--  1 reprepro nogroup 186K Jul 27 13:55 comb.2006-07-21-1255.40.gz
-rw-r--r--  1 reprepro nogroup 206K Jul 27 13:55 comb.2006-07-20-1302.38.gz
-rw-r--r--  1 reprepro nogroup 226K Jul 27 13:56 comb.2006-07-19-1301.33.gz
-rw-r--r--  1 reprepro nogroup 246K Jul 27 13:56 comb.2006-07-18-1311.49.gz
-rw-r--r--  1 reprepro nogroup 289K Jul 27 13:56 comb.2006-07-17-1328.22.gz
-rw-r--r--  1 reprepro nogroup 332K Jul 27 13:56 comb.2006-07-16-2314.28.gz
-rw-r--r--  1 reprepro nogroup 351K Jul 27 13:57 comb.2006-07-15-1308.02.gz
-rw-r--r--  1 reprepro nogroup 370K Jul 27 13:57 comb.2006-07-14-1250.45.gz
-rw-r--r--  1 reprepro nogroup 392K Jul 27 13:57 comb.2006-07-13-1257.25.gz
-rw-r--r--  1 reprepro nogroup 424K Jul 27 13:57 comb.2006-07-12-1242.39.gz
-rw-r--r--  1 reprepro nogroup 443K Jul 27 13:58 comb.2006-07-11-1246.14.gz
-rw-r--r--  1 reprepro nogroup 462K Jul 27 13:58 comb.2006-07-10-1321.18.gz
-rw-r--r--  1 reprepro nogroup 495K Jul 27 13:58 comb.2006-07-10-0029.06.gz
-rw-r--r--  1 reprepro nogroup 538K Jul 27 13:59 comb.2006-07-08-1242.03.gz
-rw-r--r--  1 reprepro nogroup 547K Jul 27 13:59 comb.2006-07-07-1233.30.gz



Reply to: