[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

format 0.939000 for breaking the 9.3GB barrier



Hi!
The deb-2.0 format suffers from being unable to carry data larger than 9.3GB
(10 gibiwhatever bytes -- this limitation uses a weird radix of 10, unusual
in computing, might be related to twice the number of fingers some apes have
on their appendage, or half of the total number of fingers that species of
ape has in total).

And we're closer and closer to get there: the last time we spoke, max
package size was 1.7GB, it's 5.5GB today.  In fact, judging by
Installed-Size alone, some other packages would already breach this limit,
had they shipped the data instead of fetching it from the Interwebs.

The current max is kicad-packages3d_7.0.7-1_all.deb with data.tar
5839452160 bytes in size.


There were suggestions of other package format, but to my knowledge none
have been implemented or even researched.  Which leaves the deb-old format
(version 0.939000; I'll round it to "1" hereafter).

As far as I know, format 2.0 was devised with some undescribed extensions
in mind; none of those extensions has appeared during 28 years since we
made the switch -- any new stuff has gone into control.tar instead.

Thus, I propose we revert to the old format.

Benefits:
 * no 10¹⁰ data.tar limit
 * it unpacks 1% faster than 2.0
Concerns:
 * no support for compressors other than cat/ncompress/gzip yet
 * external tools may not know it

The speed-up benefit is a bit puzzling, but consistently shows up in my
benchmarks, using any underlying compressor.  It's even present, to a
smaller degree, in zunpack (my reimplementation using libarchive, part of
stalled zdebootstrap) -- which should have no overhead for ar pieces.

As for other compressors:
 * format 2.0 explicitly names .deb components, currently supporting .tar
   .tar.{xz,gz,bz2,zst}
 * format 0.939 uses /bin/gzip to do format sniffing, currently supporting
   .tar .tar.Z .tar.gz (there's also an #ifdefed internal implementation,
   which afaik does the same but doesn't support .Z)

It would be easy to extend that sniffing to newer compressors.  Existing
tools that already so so transparently include libarchive or my zst, but
it would be no rocket surgery to do that in dpkg itself.

(zst is my project to sanitize command-line tools like /bin/gzip
/usr/bin/xz or /usr/bin/zstd to have consistent behaviour and supported
options.  It's a bit stalled, lacking eg. parallelization or --rsyncable,
but available in a working state in Bookworm.)

Packages using the old format but new compressors would obviously fail
to install using Bookworm's dpkg, but that's not a problem:
 * the bulk of packages would remain on 2.0, at least for X releases
 * bulky packages are few, and they can Pre-Depend a version of dpkg
   that adds support for new compressors

As for external tools, those that properly call dpkg will work out of the
box, this is fortunately most of them.  The rest would need to grow
such support, I haven't done that research yet.


So, before any of us commits more effort, please say if this is the way
to go.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!?
⢿⡄⠘⠷⠚⠋⠀                                 -- Genghis Ht'rok'din
⠈⠳⣄⠀⠀⠀⠀


Reply to: