Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

To: debian-devel@lists.debian.org
Subject: Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation
From: Adam Borowski <kilobyte@angband.pl>
Date: Sat, 27 Apr 2013 16:09:16 +0200
Message-id: <[🔎] 20130427140916.GA24530@angband.pl>
In-reply-to: <[🔎] 20130426224644.GF2619@decadent.org.uk>
References: <[🔎] 517658D5.9040909@debian.org> <[🔎] 20130423102323.GE31301@an3as.eu> <[🔎] 517675D3.3030605@rostlab.org> <[🔎] 1366722785.3022.4.camel@deep-thought> <[🔎] 517A7F67.3070509@rostlab.org> <[🔎] 20130426224644.GF2619@decadent.org.uk>

On Fri, Apr 26, 2013 at 11:46:44PM +0100, Ben Hutchings wrote:
> On Fri, Apr 26, 2013 at 03:21:43PM +0200, Laszlo Kajan wrote:
> > On 23/04/13 15:13, Benjamin Drung wrote:
> > [...]
> > > You can use xz for the source and binary package to reduce the size. The
> > > default compression level for xz reduces the size of the source tarball
> > > from 415 MB to 272 MB:
> > 
> > Following Benjamin's suggestion and the data.debian.org document [1], we have prepared a 'metastudent-data' arch:all package that is ~130MB (xz
> > compressed).
> > The package builds required architecture dependent databases in the postinst script. The purpose of this is to save space in the archive that
> > each architecture dependent version would take up.
> [...]
> 
> Does this mean that installing the package results in having two
> uncompressed copies of the data on disk?  If so, wouldn't it be
> better to do:
> 
> 1. Compress the database (with xz).
> 2. Build the package without compression (contents are already
>    compressed so re-compressing would be a waste of time).
> 3. In postinst, decompress and convert the database to native.

If it's never going to be recompressed, you really want to compress it up
the wazoo:
                |     compression       | decompression
xz  |      size |    amd64 |      armhf |   armhf
-0  | 407076744 |  1:49.77 |    6:14.47 | 1:23.31
-6  | 271088012 | 14:56.38 |   47:40.23 | 1:02.37
-9e | 195223672 | 19:38.15 | 1:06:50    |   48.01

Far less space taken, _and_ it decompresses faster.

> However, I would expect the vast majority of installations to be on
> amd64, so if you always generate a 64-bit little-endian database
> and avoid duplicating when installing on such a machine then it
> would be better for most users (not so nice for others).

Looks like we're getting 539375329859372 cores on one blade armhf machines,
but you have a point.  I find it quite strange why would the on-disk format
ever care about word width, though: if the data fits in 32 bits, there's
lots of waste for no gain -- mmap or not.

-- 
ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ

Reply to:

References:
- Question about proper archive area for packages that require big data for operation
  - From: Laszlo Kajan <lkajan@debian.org>
- Re: Question about proper archive area for packages that require big data for operation
  - From: Andreas Tille <tille@debian.org>
- Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation
  - From: Laszlo Kajan <lkajan@rostlab.org>
- Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation
  - From: Benjamin Drung <bdrung@debian.org>
- Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation
  - From: Laszlo Kajan <lkajan@rostlab.org>
- Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation
  - From: Ben Hutchings <ben@decadent.org.uk>

Prev by Date: Bug#706158: marked as done (general: Probleme d'affichage)
Next by Date: Re: Bug#455769: same problem on wheezy + Thinkpad X220T
Previous by thread: Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation
Next by thread: Re: Question about proper archive area for packages that require big data for operation
Index(es):
- Date
- Thread