Re: proposal for a more efficient download process
hi
by quite a coincidence, while you people were discussing this idea, I was
already implementing it, in a package called 'debdelta' : see
http://lists.debian.org/debian-devel/2006/05/msg03120.html
Moreover, by some telepathy :-) , I already included features you were
proposing, and addressed problems you where discussing
(and other problems you were not discussing since you did not
try implementing it :-)
Here are the replies:
To curt manucredo : while the implementation is not exactly what you
were suggesting in your original email, it still achieves all desired
goals; moreover, it is alive an kicking.
'debdelta' differs from your implementation in this respect:
- it does not use dpkg-repack (for many good reasons, see below)
- it recreates the new .deb , and guarantees that it is equal to the
one in archives, so archive signatures can be verified;
currently it does not patch into the filesystem
(altough this would be an easy adaptation, if anybody wishes for it)
'debdelta' conforms to your requests, in that
- it can recreate the new .deb, either using the installed version of
the old .deb, or old .deb file.
On the bright side, everything is already working, there is already
a repository of patches available, and a method of downloading them.
To Tyler MacDonald :
- 'debdelta' uses 'bsdiff' , or 'xdelta' as a fallback, see below
- regarding this:
> Some work will have to go into the math to determine when it's
> actually more efficient to download the latest archive, etc.... just a
> fleeting mental note, the threshold should not be 100% of the full archives
> size, it should be 90 or 80% due to the CPU/RAM overhead of patching and the
> bandwidth/latency overhead of requesting multiple patch files vs. one
> stream of data.
This math must go in the client side, and it is in my TODO list
(see at the end of the README); it is a nice exercise in Dynamical Programming.
Anyway , currently the archive discards deltas that exceed ~50% of the
new .deb , just as an heuristic, and to keep disk usage low.
To Goswin von Brederlow :
>| bsdiff is quite memory-hungry. It requires max(17*n,9*n+m)+O(1)
Ah so this is the correct formula! The man page just says '17*n'.
But in my stats, that that is not the case; my stats
are estimating that the memory is '12*n' so that is what I use
>| bytes of memory, where n is the size of the old file and m is the
>| size of the new file. bspatch requires n+m+O(1) bytes.
> That is quite unacceptable. We have debs in debian up to 160Mb
'debdelta' has an option '-M ' to choose between 'xdelta' and 'bsdiff' ;
by default, it uses 'xdelta' when memory usage would exceed 50Mb ;
but in the server, I set '-M 200' since I have 1GB RAM there.
> Seems to be quite useless for patching full debs. One would have to
> limit it to a file-by-file approach.
This is in my TODO list. Actually, I have in mind a scheme to
break TARs at suitable points, I have to check if it is
worthwhile ; I can discuss details.
To: Tyler MacDonald again:
> True.. It'd probably only be efficient if the deltas were based on
> the contents of the .deb's before they're packed.
.. and this is the reason why I do not use dpkg-repack... why unpacking
data when I need them unpacked ? :-)
Absolutely true. Look at this
$ ls -s tetex-doc_3.0-17_all.deb tetex-doc_3.0-18_all.deb
42388 tetex-doc_3.0-18_all.deb 42340 tetex-doc_3.0-17_all.deb
$ bsdiff tetex-doc_3.0-17_all.deb tetex-doc_3.0-18_all.deb brutal.bsdiff
$ ls -s brutal.bsdiff
10092 brutal.bsdiff
Hat tip to 'bsdiff', but we can do better...
$ ar p tetex-doc_3.0-17_all.deb data.tar.gz | zcat > /tmp/17.tar
$ ar p tetex-doc_3.0-18_all.deb data.tar.gz | zcat > /tmp/18.tar
$ ls -s /tmp/17.tar /tmp/18.tar
53532 /tmp/17.tar 53580 /tmp/18.tar
$ time bsdiff /tmp/17.tar /tmp/18.tar /tmp/tar.bsdiff
times:
real 2m4.994s user 2m3.947s
memory:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9784 debdev 25 0 471m 470m 1384 T 0.0 46.5 1:18.82 bsdiff
size:
92 /tmp/tar.bsdiff
so as you see, the reduction in size is impressive,
but it uses too much memory and takes too much time.
$ time xdelta delta -m 50M -9 /tmp/17.tar /tmp/18.tar /tmp/tar.xdelta
times:
real 0m1.728s user 0m1.660s
memory... it is too fast
size:
236 /tmp/tar.xdelta
still good enough for our goal
----
Comparing to the above
$ ls -s pool/main/t/tetex-base/tetex-doc_3.0-17_3.0-18_all.debdelta
288 pool/main/t/tetex-base/tetex-doc_3.0-17_3.0-18_all.debdelta
(the extra 35kB are the script that 'debpatch' uses :-(
actually, I told 'debdelta' to use 'bzip' instead of gzip
in this cases, but it did not... just found another bug :-) )
To: Marc 'HE' Brockschmidt <he@ftwca.de>:
> Now the interesting questions: How many diffs do you keep?
very few, currently, due to space constraints; moreover , suppose that
you have a_1.deb installed, a_1_2.debdelta and a_2_3.debdelta are in
pool of deltas, want to upgrade to a_3.deb
This would work if done by hand, just doing
$ debpatch a_1_2.debdelta / /tmp/a_2.deb
$ debpatch a_2_4.debdelta /tmp/a_2.deb /tmp/a_3.deb
but 'debdelta-upgrade' now is uncapable to exploit this situation;
so I keep only one delta for each deb
> How do you
> integrate this approach with the minimal security Release files give us
> today?
recreated debs are identical to original in archive.
Currently the best way to use my package is:
$ apt-get update
$ su nobody -c debdelta-upgrade
$ mv /tmp/archives/*deb /var/cache/apt/archives
$ apt-get upgrade
(By default , debdelta-upgrade puts the resulting .deb in /tmp/archives;
use --dir to your taste, though )
As you see , I propose to run debdelta-upgrade not as root, since it is
still in development.
> What about the kind of signatures dpkg-sig provides?
Those are supported.
'debdelta' reproduces everything it sees into the .deb file,
considering it as an 'ar' archive (altough it is not exactly a 'ar'
archive, since 'ar' adds a '/' in the header , 'dpkg' does not );
it just treats control.tar.gz and data.tar.gz in a smarter way.
----- other FAQ I made up for you
Q: What about .debs where the data part is compressed with bzip ?
A: currently, is unsupported (I never found one :-)
but I did write some code to support it.
Q: can 'debpatch' recreate the new .deb using the installed old .deb, even when
- there are dpkg-diversions ?
- conf files where modified ?
A: yes, yes.
Q: can 'debpatch' recreate the new .deb using the installed old .deb,
when 'prelink' is used in the host?
A: currently, no.
a.
--
Andrea Mennucc
Reply to: