Re: IDEA to SERIOUSLY reduce download times!
On Wed, Jul 07, 1999 at 11:49:52AM -0400, Daniel Burrows wrote:
> On Wed, Jul 07, 1999 at 01:39:38AM -0400, Steve Dunham was heard to say:
> Here's how my script works (pretty much the same as yours I think):
> -> unpack the data.tar.gz member of both .debs
> -> unpack the control section of the new .deb
> -> create two temporary staging areas called 'delta' and 'shipping'
> -> move all conffiles for the new .deb into 'shipping'
> -> for all regular files in the new version that also exist as regular files
> in the old version, perform an xdelta and save the result as a file with
> the same name relative to 'delta'. That is, a delta for usr/doc/README
> will be stored in 'delta'/usr/doc/README . If the delta is larger than the
> new file, delete it.
> Files which do not have a delta generated (either because they are links,
> because the delta was larger than the file itself, or because there's no
> corresponding file in the original package) are moved to 'shipping'.
> -> data.tar.gz is overwritten with the contents of 'shipping'
> -> delta.tar.gz is created with the contents of 'delta'
> -> An ar archive is created which contains debian-binary, control.tar.gz,
> data.tar.gz, and delta.tar.gz. [ perhaps I should modify debian-binary in
> the new archive]
How do you handle configuration files? You should put them directly in
shipping. Do you also check for change in permission, etc? That's an
important part of security updates. [Sorry, I don't have too much time
for checking your scripts. You have made a great work just by designing
this. I'm pretty sure my suggestions will be easy to implement if not already
there :) ]
> This is the easy bit :-) All that's still needed is clean handling of
> I still cannot find a clean way to actually apply the patches. Ideally, it
> would be quite simple: you would execute dpkg --install on the new file. In
> the 'unpack' phase of dpkg, dpkg would unpack data.tar.gz as usual, but then do
> an 'xdelta patch' for all contents of delta.tar.gz, creating backups of
> originals as with data.tar.gz [I don't actually know what mechanism is used
> normally for this; are the old files renamed or do the new ones get .dpkg-new
> appended, or is something else done?]. This way, if something goes horribly
> wrong in the patching you can complain about an error and restore things to the
> way they were. There would have to be a way to indicate patching information
> elsewhere, of course. Perhaps a Patches: control item could be added; I don't
> know what would be done about Packages.gz and apt, or whether distributing
> patches on the FTP mirrors is a good use of space.
> Of course, that's a pipe dream :-) I've also considered hackery using
> preinst scripts to do the patch [and therefore having to include delta.tar.gz
> somewhere inside data.tar.gz] but this would get nasty -- in order to have
> dpkg's file list come out correctly, data.tar.gz would have to contain entries
> for all files that were really in the package. Another option (probably the
> best for now) is to use dpkg-repack: create a temporary 'old' package, extract
> it and the new data.tar.gz to a temporary directory, apply the patches and
> copy the patched files to where the new data.tar.gz was extracted, recreate
> data.tar.gz with the patched files, and create the new deb from debian-binary,
> control.tar.gz, and the rebuilt data.tar.gz .
That's the easier/safer way I think. For sure, being able to simply apply
the patches will be better but will make the system harder to repair.
We should also check which versioning policy we should apply to not bloating
the archives unnecessary [as if it was not already]. I think is the major
reason why no bin-diff never make it into debian. Whatever solutions you
could come out for differing a binaries, you still have to find a way to
make binary diff pratical. May be we should think about a way to get back
to the original package. So, the procedure will be similar to yours except
for a new middle steps:
1- dpkg-repack the package.
2- remove the old diff so we can have a 'orig.deb' package.
3- apply the new patch.
This has some disavantage, although. We should keep the install-diffs
for all packages install this way around so we can removed it later.
Also, the maintainer should be able to start over a new diffs if major
modifications is made. This can be done simply by providing a 'orig.deb'
package with no diffs. I also suggests to add a control field indicating
if a patches is provided against which version (ex. Version: 1.2.3-5,
Patches: 1.2.3-1). This control will be preserved in both the available
list and the status list. So, when apt decided to download a package,
he should check:
1) Is the packages provides a patches?
2) If yes, do we have the orig.deb in status?
(indicate in the new control field of the available package).
3a) Yes, good! download the patches and apply it to orig.deb.
3b) No, then do we have a patches to this orig.deb?
4) Yes, good! remove the old patches and apply the new one we just download.
If we keep a simple rule of patches should always be made over original
packages (the ones with no patche), we can even ask to dinstall to make
the patches themself when the patches field is present. We should however
not stick too much with rules. Providing diff against old-stable can
also be a good thing.
Even if the patches break for whatever reason, we can always get back to
the old methods: full package download. Remember that this step will be
done in the download phase of dselect/apt. Not when upgrading packages.
Because the full download is still available.
> Anyway, no more time to think about this at the moment :-)
Not much on my side but I think is the more realisable idea I see on this
subject for a while. Even has an optional features for maintainers,
is still be good. The only drawback we should check now, it's how this
will affect the size of the ftp archives (CDROM will always be sent with full
packages on it). Given a 50% reductions on the packages diffs, this will
make the archives 50% bigger than now. But that's not really the case
because lot of packages will stick to orig.deb. Also, we can let dinstall
decide if it keeps the diffs or removed it (updating the control file
by the same occasion). The diffs can also be on another server, with their
own Packages.gz (containing solely the Package:, Version:, and Patches:
fields). Apt will mix them on update then will choice the best choice to
> "I've struggled with reality for thirty-five years, but I'm glad to say that
> I finally won."
> -- _Harvey_
[ snip the script -- hope to have time soon to take a look ]
Hoping to heard more about it...
Fabien Ninoles GULUS founder
aka Corbeau aka le Veneur Gris Debian GNU/Linux maintainer
RSA PGP KEY [E3723845]: 1C C1 4F A6 EE E5 4D 99 4F 80 2D 2D 1F 85 C1 70