[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: A radically different proposal for differential updates



On Tue, Aug 15, 2017 at 08:51:23PM +0200, Julian Andres Klode wrote:
> On Tue, Aug 15, 2017 at 09:26:24AM +0200, Christian Seiler wrote:
> > Hi there,
> > 
> > I've come to believe that binary diff packages are not the best way of
> > solving this issue. Intead I'd like to propse a radically different
> > solution to this issue.
> > 
> > The gist of it: instead of adding a format for how deltas work, I
> > propose to introduce a new format for storing Debian packages that will
> > be used for both the initial installation _and_ incremental updates.
> > 
> > This idea was inspired by the following talk given by Lennart
> > Poettering about a new tool he's written (which is already packaged for
> > Debian by the way):
> > https://www.youtube.com/watch?v=JnNkBJ6pr9s
> > 
> > 
> > 
> > Now to my proposal:
> > 
> > A Debian package currently consists of two files: control.tar.gz and
> > data.tar.xz (or .gz). What I want to propose is a new format that does
> > not contain a data.tar.xz at all. Instead I'd like to split the
> > data.tar.xz into chunks and have the new format only contain an index
> > that references these chunks. Let me call this new format "cdeb" for
> > "chunked deb".
> > 
> [...]
> > Anyway: thoughts? Regards, Christian
> 
> It's of course an awesome idea. But:
> 
> I generally agree with the idea of chunk stores. They'd improve
> things a lot. Now, instead of chunking the tarfiles, chunk both
> the individual files, and the tarfiles. Then, with an index for
> the individual files in control.tar listing the chunks, you can
> easily reconstruct just the files that changed on your system
> and avoid any rebuilding of debs for upgrades :D

As I figured out yesterday, I actually do not agree with the
idea. Due to the way ELF files are structured, chunking them
produces no useful deduplication in a lot of cases. For example,
firefox 55.0-1 and 55.0-2 basically did not deduplicate at all
with librsync, casync, or borg. The reason is of course that
ELF files contains lot of offsets in the assembly code, if you
add stuff, all following machine code changes.

-- 
Debian Developer - deb.li/jak | jak-linux.org - free software dev
                  |  Ubuntu Core Developer |
When replying, only quote what is necessary, and write each reply
directly below the part(s) it pertains to ('inline').  Thank you.


Reply to: