[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Fwd: debdelta : (no) indexes, (no) incremental



---------- Forwarded message ----------
From: A Mennucc <mennucc1@debian.org>
Date: Tue, 29 Mar 2011 10:46:47 +0200
Subject: debdelta  : (no) indexes, (no) incremental
To: udeshike@gmail.com, mvo@debian.org
Cc: Paul Wise <pabs@debian.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dear Ishan Jayawardena and Michael Vogt (and Paul Wise in CC),

Paul just notified me of
 http://wiki.debian.org/SummerOfCode2011/AptDebdeltaIntegration
First and foremost, I hope you will actively involve me in this project
(I knew nothing about it up to today).

Let's come to the project: it is overall a nice idea , but with a
mistake: the index of deltas . I have some (IMHO convincing) arguments
against an index of deltas.

- ------ indexes

- ---- indexes of debs in APT

Let's start examining the situation for debs and APT.
Using indexes for debs is a no-brainer decision: indeed, the client
(i.e. the end user) does not know the list of available debs in the
server, and, even knowing the current list, cannot foresee the future
changes.
So indexes provide needed informations: the packages' descriptions,
versions, dependencies, etc etc; these info are used by apt and the
other frontends.

- ---- indexes of deltas in debdelta

If you then think of deltas, you realize that all requirements above
fall. Firstly there is no description and no dependencies for deltas.
(deltas have a "info" section, but let's skip this for simplicity, the
argument is complex already; I will talk of this in other emails).

Of course the client needs some information to determine if a delta
exists, and to download it; but these information are already
 available in the client:
   the name of the package P
   the old version  O
   the new version  N
   the architecture A

Once these are known, the URL of the file F can be algorithmically
determined as
  URI/POOL/P_O_N_A.debdelta
where URI is determined from
  /etc/debdelta/sources.conf
and POOL is the directory in the pool of the package P .
This algorithm is also implemented (quite verbosely) in
 contrib/findurl.py  in the sources of debdelta.

(You should (really!) also read
 http://debdelta.debian.net/README_upgrade.txt
for more details and further info.)

This is the reason why currently there is no "index of deltas", and
nonetheless 'debdelta-upgrade' works fine (and "cupt" as well).

Adding an index of file would only increase downloads (time and size)
and increase disk usage; with negligeable benefit, if any.


- ------ no incremental deltas

Let me add another point that may be unclear. There are no incremental
deltas (and IMHO never will be). Example.

- ---- delta server behavior

Suppose that the unstable archive,
on 1st Mar, contains foobar_1_all.deb
(and it is in pool/main/f/foobar/ ) ;
then on 2nd Mar, foobar_2_all.deb is uploaded;
but this has a flaw (e.g. FTBFS) and so
on 3rd Mar  foobar_3_all.deb is uploaded.

On 2nd Mar, the delta server generates (in its archive pool inside
http://debdeltas.debian.net/debian-deltas )
 pool/main/f/foobar/foobar_1_2_all.debdelta
On 3rd Mar, the server generates both
 pool/main/f/foobar/foobar_1_3_all.debdelta
 pool/main/f/foobar/foobar_2_3_all.debdelta

So, if the end-user Ann upgrades the system on both 2nd and 3rd Mar,
then she uses both foobar_1_2_all.debdelta (on 2nd) and
foobar_2_3_all.debdelta (on 3rd Mar). If the end-user Boe has not
upgraded the system on 2nd Mar, , and he upgrades on 3rd Mar, then on
3rd Mar he uses foobar_1_3_all.debdelta .

- ---- size limit

Note that currently the server rejects deltas that exceed 70% of the deb
size: indeed the size gain would be too small, and the time would be
wasted, if you sum the time to download the delta and the time to apply
it (OK, these are run as much as possible in parallel, yet ....)

- ---- What "incremental" would be, and why it is not

What does not happen currently is what follows:
on 3rd Mar , Boe decides to upgrade, and invokes 'debdelta-upgrade';
then  'debdelta-upgrade' finds foobar_1_2_all.debdelta and
foobar_2_3_all.debdelta , it uses the foremost to generate
foobar_2_all.deb, and in turn it uses this and the second delta to
generate foobar_3_all.deb .

This is not implemented, and it will not, for the following reasons.
(1) The delta size is, on average, 40% of the size of the deb (and this
is getting worse, for different reasons): so two deltas are 80% of the
target deb, and this too much.
(2) It takes time to apply a delta; applying two deltas to produce one
deb takes too much time.
(3) The server does generate the direct delta foobar_1_3_all.debdelta
 :-) so why making things complex when they are easy?  :-)

(Why did I explain 'incremental deltas'? Because incremental deltas may
need some index system to be implemented... indeed, Boe would have no
way to know on 3rd Mar that the intermediate version of foobar between
"1" and "3" is "2"; but since incremental deltas do not exist, then
there is no need to have indexes).

- ------

When I find further time, I will write more on : the info section in
deltas; the 10k download strategy in debdelta-upgrade.


a.

ps: there is nothing private in this email... if you feel so, you may
forward it to an appropriate mailing list ; I didn't myself, since I was
not sure on which mailing list is appropriate
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2RnHYACgkQ9B/tjjP8QKTXWQCgiixSEisEo/suYmx6JX3BssUN
JqUAnRHgnkgwujinjbBDb5Cg9LzTNdWe
=ofrV
-----END PGP SIGNATURE-----



-- 
Regards,
Ishan Jayawardena.


Reply to: