Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation
On Fri, Apr 26, 2013 at 11:46:44PM +0100, Ben Hutchings wrote:
> On Fri, Apr 26, 2013 at 03:21:43PM +0200, Laszlo Kajan wrote:
> > On 23/04/13 15:13, Benjamin Drung wrote:
> > [...]
> > > You can use xz for the source and binary package to reduce the size. The
> > > default compression level for xz reduces the size of the source tarball
> > > from 415 MB to 272 MB:
> >
> > Following Benjamin's suggestion and the data.debian.org document [1], we have prepared a 'metastudent-data' arch:all package that is ~130MB (xz
> > compressed).
> > The package builds required architecture dependent databases in the postinst script. The purpose of this is to save space in the archive that
> > each architecture dependent version would take up.
> [...]
>
> Does this mean that installing the package results in having two
> uncompressed copies of the data on disk? If so, wouldn't it be
> better to do:
>
> 1. Compress the database (with xz).
> 2. Build the package without compression (contents are already
> compressed so re-compressing would be a waste of time).
> 3. In postinst, decompress and convert the database to native.
If it's never going to be recompressed, you really want to compress it up
the wazoo:
| compression | decompression
xz | size | amd64 | armhf | armhf
-0 | 407076744 | 1:49.77 | 6:14.47 | 1:23.31
-6 | 271088012 | 14:56.38 | 47:40.23 | 1:02.37
-9e | 195223672 | 19:38.15 | 1:06:50 | 48.01
Far less space taken, _and_ it decompresses faster.
> However, I would expect the vast majority of installations to be on
> amd64, so if you always generate a 64-bit little-endian database
> and avoid duplicating when installing on such a machine then it
> would be better for most users (not so nice for others).
Looks like we're getting 539375329859372 cores on one blade armhf machines,
but you have a point. I find it quite strange why would the on-disk format
ever care about word width, though: if the data fits in 32 bits, there's
lots of waste for no gain -- mmap or not.
--
ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ
Reply to: