[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: A radically different proposal for differential updates



Isn't there kind of a universal issue that tar and compression happen
sort of in the wrong order?  Wouldn't it make more sense to make files
that were .gz.tar (ie. compress the files individually, then have an
index into them via tar.)  Then tar works perfectly well for
extracting individual files without having to download the entire
thing.  For people who want the entire file, no problem.  For people
who want to minimize client side storage, they can fetch byte ranges
of files to extract the Table of contents of files.

The delta dpkg's are just the .gz.tar of the new files. tar already
has --append, that might be used to just add the delta's to the
existing files, so they become a single dpkg that contains multiple
versions.

Once you can seek (and remotely use byte ranges) there is a whole lot
of options, and it is compression after the index is built that takes
all those options away.



On Tue, Aug 15, 2017 at 2:51 PM, Julian Andres Klode <jak@debian.org> wrote:
> On Tue, Aug 15, 2017 at 09:26:24AM +0200, Christian Seiler wrote:
>> Hi there,
>>
>> I've come to believe that binary diff packages are not the best way of
>> solving this issue. Intead I'd like to propse a radically different
>> solution to this issue.
>>
>> The gist of it: instead of adding a format for how deltas work, I
>> propose to introduce a new format for storing Debian packages that will
>> be used for both the initial installation _and_ incremental updates.
>>
>> This idea was inspired by the following talk given by Lennart
>> Poettering about a new tool he's written (which is already packaged for
>> Debian by the way):
>> https://www.youtube.com/watch?v=JnNkBJ6pr9s
>>
>>
>>
>> Now to my proposal:
>>
>> A Debian package currently consists of two files: control.tar.gz and
>> data.tar.xz (or .gz). What I want to propose is a new format that does
>> not contain a data.tar.xz at all. Instead I'd like to split the
>> data.tar.xz into chunks and have the new format only contain an index
>> that references these chunks. Let me call this new format "cdeb" for
>> "chunked deb".
>>
> [...]
>> Anyway: thoughts? Regards, Christian
>
> It's of course an awesome idea. But:
>
> I generally agree with the idea of chunk stores. They'd improve
> things a lot. Now, instead of chunking the tarfiles, chunk both
> the individual files, and the tarfiles. Then, with an index for
> the individual files in control.tar listing the chunks, you can
> easily reconstruct just the files that changed on your system
> and avoid any rebuilding of debs for upgrades :D
>
> That said, I believe that this change won't sell. Replacing the
> basic format the repository is made of won't fare well. Too many
> tools (most of which probably are not known) rely on the presence
> of deb files in archives.
>
> So as sad as it might be, I think we probably have to settle for
> delta files.
>
> --
> Debian Developer - deb.li/jak | jak-linux.org - free software dev
>                   |  Ubuntu Core Developer |
> When replying, only quote what is necessary, and write each reply
> directly below the part(s) it pertains to ('inline').  Thank you.
>


Reply to: