Re: xz support in dpkg (was Re: dpkg plans for the squeeze cycle)

To: debian-dpkg@lists.debian.org
Subject: Re: xz support in dpkg (was Re: dpkg plans for the squeeze cycle)
From: Guillem Jover <guillem@debian.org>
Date: Sat, 10 Oct 2009 11:16:07 +0200
Message-id: <[🔎] 20091010091607.GA3400@gaara.hadrons.org>
Mail-followup-to: debian-dpkg@lists.debian.org
In-reply-to: <[🔎] 20091001001901.GB2090@progeny.tock>
References: <87ocqgexsk.fsf@pindar.marcbrockschmidt.de> <20090907155852.GA18252__49910.5860050351$1252346447$gmane$org@gaara.hadrons.org> <20090922063854.GA2860@progeny.tock> <20090925035854.GD9752@gaara.hadrons.org> <[🔎] 20091001001901.GB2090@progeny.tock>

Hi!

On Wed, 2009-09-30 at 19:19:01 -0500, Jonathan Nieder wrote:
> Guillem Jover wrote:
> > I guess a better question is, how much benefit a bigger dictionary size
> > would give us?

> Good question.  Fedora people have been recently considering a similar
> question (they’re focused on speed rather than memory usage, but still
> it comes down to dictionary size versus compression ratio).
> 
> From
> <http://thread.gmane.org/gmane.linux.redhat.fedora.devel/121067/focus=121116>
> we can conclude that once the dictionary is larger than the payload it
> doesn’t win us much. ;)
> 
> From
> <http://www.advogato.org/person/badger/diary/80.html> we can conclude
> that with a reasonably sized and somewhat formulaic text file (an SQL
> database dump), preset -3 is good enough.  That’s a dictionary size
> of 1 MiB.
> 
> For deciding on limits, it would probably be good to experiment with actual
> “worst case” Debian packages (maybe openoffice.org).

Yeah.

I was checking a bit, and found lzip, and its companion lzlib, which
seems to have a pretty straight forward API (both packaged in Debian):

  <http://www.nongnu.org/lzip/lzip.html>
  <http://www.nongnu.org/lzip/lzlib.html>

And this thread with some comparisons (although against an old lzip
version) and a link to a blog post with interesting points in favor
of it instead of xz:

  <http://lists.gnu.org/archive/html/lzip-bug/2009-10/msg00000.html>

I also found this about the xz endianness problem, but it seems to have
been fixed already upstream:

  <http://www.mail-archive.com/fedora-devel-list@redhat.com/msg08013.html>

So it would be nice to consider it as well.

> > We can try to specify it, and codify it in the tools, but there's people
> > out there building packages with ar and tar...
> 
> Yes, dpkg should not break this way of working.

Well, creation of packages that way should not be encouraged either.

> > >     Related question: If an LZMA-based file format might ever be used
> > >     for udebs, what are the memory constraints for unpacking those?
> > 
> > Well this is outside the scope of dpkg itself, and more a project wide
> > decision, but I'm not sure we'd want any package in the base system
> > built with anything but gzip, as that's shared by derivatives,
> > embedded distros, etc. xz should probably be used for big packages that
> > are guaranteed to be used on desktops or huge boxes (think games,
> > openoffice.org, etc).
> 
> Makes sense.  xz was developed for an embedded distro, and its memory
> usage can be kept under control by using a small dictionary size, but we
> probably don’t want to slow down the install too much just for the sake
> of smaller packages.

Oh right, realized that after having sent the mail and checking around
a bit. I guess the problem is that embedded is a too wide target. So
some systems with low disk space and reasonable memory might truly
benefit from it but ones with the inverse might not.

> One can indeed read the amount of memory from the file headers.
> Unfortunately, the maximum dictionary size is 4 GiB, and I would think
> using 4 GiB of memory to unpack a package even if that’s available would
> be bad behavior for dpkg.  It is not obvious that examining the contents
> of an untrusted package should be considered an unsafe operation (on a
> server where this could lead to denial of service, for example).

If the package is untrusted then you should better not be installing it
anyway.

> > OTOH if the package is out of spec we can do whatever we want, but I'd
> > rather make dpkg cope with such packages gracefully.
> 
> Agreed.

Just to be clear, what I meant was that if it's not going to be
possible at all to extract it anyway, it should abort up-front in a
controlled way, and not just getting an ENOMEM in the middle of the
unpacking.

regards,
guillem

Reply to:

References:
- Re: xz support in dpkg (was Re: dpkg plans for the squeeze cycle)
  - From: Jonathan Nieder <jrnieder@gmail.com>

Prev by Date: [PATCH] hrmib cache for dpkg
Next by Date: Re: Ubuntu noninteractive aptitude mysql install
Previous by thread: Re: xz support in dpkg (was Re: dpkg plans for the squeeze cycle)
Next by thread: RFC: dpkg-conffile cmdline interface draft
Index(es):
- Date
- Thread