[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation



On Fri, Apr 26, 2013 at 11:46:44PM +0100, Ben Hutchings wrote:
> On Fri, Apr 26, 2013 at 03:21:43PM +0200, Laszlo Kajan wrote:
> > On 23/04/13 15:13, Benjamin Drung wrote:
> > [...]
> > > You can use xz for the source and binary package to reduce the size. The
> > > default compression level for xz reduces the size of the source tarball
> > > from 415 MB to 272 MB:
> > 
> > Following Benjamin's suggestion and the data.debian.org document [1], we have prepared a 'metastudent-data' arch:all package that is ~130MB (xz
> > compressed).
> > The package builds required architecture dependent databases in the postinst script. The purpose of this is to save space in the archive that
> > each architecture dependent version would take up.
> [...]
> 
> Does this mean that installing the package results in having two
> uncompressed copies of the data on disk?  If so, wouldn't it be
> better to do:
> 
> 1. Compress the database (with xz).
> 2. Build the package without compression (contents are already
>    compressed so re-compressing would be a waste of time).
> 3. In postinst, decompress and convert the database to native.

If it's never going to be recompressed, you really want to compress it up
the wazoo:
                |     compression       | decompression
xz  |      size |    amd64 |      armhf |   armhf
-0  | 407076744 |  1:49.77 |    6:14.47 | 1:23.31
-6  | 271088012 | 14:56.38 |   47:40.23 | 1:02.37
-9e | 195223672 | 19:38.15 | 1:06:50    |   48.01

Far less space taken, _and_ it decompresses faster.

> However, I would expect the vast majority of installations to be on
> amd64, so if you always generate a 64-bit little-endian database
> and avoid duplicating when installing on such a machine then it
> would be better for most users (not so nice for others).

Looks like we're getting 539375329859372 cores on one blade armhf machines,
but you have a point.  I find it quite strange why would the on-disk format
ever care about word width, though: if the data fits in 32 bits, there's
lots of waste for no gain -- mmap or not.

-- 
ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ


Reply to: