[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: you blog post: FTPMaster meeting, II - NO baklava



[Adding debian-dak@lists.d.o into the loop. We should continue there]

> currently I have a complex script that creates deltas for amd64 and
> i386, and for lenny squeeze sid experimental ; and a simpler one for
> lenny-security ; you find both in attachment

> let me start outlining how the scripts generate deltas . It is a 3 steps
> process.

> lets say that $secdebmir is the directory containg the mirror of the
> repository security.debian.org

> --- 1st step
>  #make copy of current stable-security lists of packages
>  olddists=${TMPDIR:-/tmp}/oldsecdists-`date +'%F_%H-%M-%S'`
>  mkdir $olddists
>  cp -a $secdebmir/dists $olddists
> --- 2nd step
> call 'debmirror' to update the mirror ; note that I apply a patch to
> debmirror so that old debs are not deleted , but moved to a /old_deb
> directory
> --- 3rd step
> call 'debdeltas' to generate deltas , from the state of packages in
> $olddists to the current state in $secdebmir , and also wrt what is in
> stable.
> Note that,  for any package that was deleted from the archive, then
> 'debdeltas' will go fishing for it inside /old_deb .

> The more complex script also keeps 4 days of memory of the package
> indexes, and then creates deltas from 4-days-ago to today , just in case.

That seems fine for an external resource, but if we integrate it, its
not the way to go.

>> I don't want to end up with an unmaintainable sh*t as pdiffs are, which
>> actually make situation worse in many cases for the users, so we need to
>> find some sensible definitions even for their side.
> debdelta is not as problematic as pdiff : if there are broken deltas in
> the central repository, then the debdelta-upgrade program simply
> complains and goes on , and all is solved by downloading the deb eventually

> moreover, the repository of deltas does not contain indexes of any sort

>> Lets start with some points:
>>  - I assume you are willing to do the work it takes to get it properly
>>    integrated into dak, should we go down this road?
> yes of course, but I need some help, I don't know precisely how dak
> works

Thats fine, we throw you into the right directions.

> first question: is there a host that contains in its filesystem both the
> current Debian repository and the snapshots archive ? this would be the
> best place where to generate deltas

There probably is, but thats not how I imagine this thing to run
currently. :)

> in such a place, the 3 steps may be reformulated as two steps:
> one step before dak updates the repository, is just
> 'cp -a $secdebmir/dists $olddists'

> second step after dak, is to invoke 'debdeltas' to create deltas .
> In this case, 'debdeltas' would fish for debs inside the 'snapshot'
> archive

You think too much in a mirror structure with a seperate script and not
enough with the archive in mind. Archive side I think the right place to
generate the delta would be the moment we see the new version and
install the new upload into the archive. So the delta falls out just in
the middle of the normal archive handling. After all we DO have the old
files around for a while before removing them, so can do that right there.
Or maybe asynchronous process triggered by the upload processing, as to
not block that too much.

Needs a bit of thinking exactly how, as we then not only want to
generate the delta to the last version of the same suite we just upload
to, but maybe also the one for another, lets say an upload to unstable
triggers one delta for the old unstable version and one delta for the
testing version (if different). The mirror would only see the unstable
one, until the time the version moves over to testing, then this gets
copied in.
Or the testing import scripts have to generate them.


>>    If we take it I don't want it to be something outside, but integrated
>>    well with the rest, so the delta output "just" falls out "accidently
>>    somewhere during the normal package accept". :)
> (not sure I understand)

I don't want the way you described above, I want it inside the archive
handling directly.

>>  - How much disk space do we talk about?
> currently I am creating deltas for amd64 and i386, and for
> lenny squeeze sid experimental ;
> there are ~14000 deltas  , the total size of deltas is ~ 9Gbytes .

> I also create deltas for lenny-security, those are ~ 5GByte .

That doesn't sound bad. Well, for all architectures it will be some bit,
but should still be acceptable. Between 40 and 50gb it seems.

>>  - How many deltas do we keep around? Especially for unstable this might
>>    make a big difference.
> I keep deltas for 50 days at most ; also I delete the oldest  when I
> have less than 2GB of space free

And with those 50 days you have only 9gb deltas for 2 architectures, 4
suites?

>>  - Also, do users have to follow a path from their version over all the
>>    ones uploaded in the meantime until they get the final one, or do you
>>    store one from each version up to the newest available package
>>    version, limited by some time factor?
> there is no path implemented, 'debdelta-upgrade' only looks for a
> one-shot delta, from the installed version in the user PC , to the new
> version in the archive

Ok. Now that makes it a bit harder to put into the normal archive
workflow, as we do not keep the old versions around that long.

>>  - How bad is it if it fails somewhere? With pdiffs and an error my
>>    nearly only hope is to remove the whole pdiff crap and restart at 0.
> nothing like that happens with deltas . Anyway  the script
> 'debmirror-deltas' has a --recover option

>>  - Do you limit debdelta creation to .debs where it actually makes
>>    sense, ie something >=500kb?
> yes, but 500kb is too much

> first of all, note that any delta that is larger than 70% of the
> corresponding deb is deleted

> regarding size, there is a lower limit of 10KB ; according to the
> statistics, in many cases even small packages can be effectively
> delta-ed ; e.g. today I read

>  OLD: Package: emdebian-grip-server Version: 2.2.6 Architecture: all
> Installed-Size: 396
>  NEW: Package: emdebian-grip-server Version: 2.2.7 Architecture: all
> Installed-Size: 400
>  delta is 9.4% of deb; that is, 92kB are saved, on a total of 102kB.

Interesting.

>>> I wrote and mantain the 'debdelta' service , see
>>> http://debdelta.debian.net/README>
>> We know this exist, yes. :)
> (curiously, I just went to DUCCIT , a Debian/Ubuntu mini conference in
> Perugia, and nobody there knew about it)

We know things our archive could want. :)

> #!/bin/bash -e

Just a sidenote: This is bad. Use a seperate -e line, or -e wont be in
effect should one go and run it like /bin/bash /scriptfile for example.


-- 
bye, Joerg
Lisa, if you don't like your job you don't strike. You just go in every
day and do it really half-assed. That's the American way.


Reply to: