[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: xz support in dpkg (was Re: dpkg plans for the squeeze cycle)



Hi!

On Wed, 2009-09-30 at 19:19:01 -0500, Jonathan Nieder wrote:
> Guillem Jover wrote:
> > I guess a better question is, how much benefit a bigger dictionary size
> > would give us?

> Good question.  Fedora people have been recently considering a similar
> question (they’re focused on speed rather than memory usage, but still
> it comes down to dictionary size versus compression ratio).
> 
> From
> <http://thread.gmane.org/gmane.linux.redhat.fedora.devel/121067/focus=121116>
> we can conclude that once the dictionary is larger than the payload it
> doesn’t win us much. ;)
> 
> From
> <http://www.advogato.org/person/badger/diary/80.html> we can conclude
> that with a reasonably sized and somewhat formulaic text file (an SQL
> database dump), preset -3 is good enough.  That’s a dictionary size
> of 1 MiB.
> 
> For deciding on limits, it would probably be good to experiment with actual
> “worst case” Debian packages (maybe openoffice.org).

Yeah.

I was checking a bit, and found lzip, and its companion lzlib, which
seems to have a pretty straight forward API (both packaged in Debian):

  <http://www.nongnu.org/lzip/lzip.html>
  <http://www.nongnu.org/lzip/lzlib.html>

And this thread with some comparisons (although against an old lzip
version) and a link to a blog post with interesting points in favor
of it instead of xz:

  <http://lists.gnu.org/archive/html/lzip-bug/2009-10/msg00000.html>

I also found this about the xz endianness problem, but it seems to have
been fixed already upstream:

  <http://www.mail-archive.com/fedora-devel-list@redhat.com/msg08013.html>

So it would be nice to consider it as well.

> > We can try to specify it, and codify it in the tools, but there's people
> > out there building packages with ar and tar...
> 
> Yes, dpkg should not break this way of working.

Well, creation of packages that way should not be encouraged either.

> > >     Related question: If an LZMA-based file format might ever be used
> > >     for udebs, what are the memory constraints for unpacking those?
> > 
> > Well this is outside the scope of dpkg itself, and more a project wide
> > decision, but I'm not sure we'd want any package in the base system
> > built with anything but gzip, as that's shared by derivatives,
> > embedded distros, etc. xz should probably be used for big packages that
> > are guaranteed to be used on desktops or huge boxes (think games,
> > openoffice.org, etc).
> 
> Makes sense.  xz was developed for an embedded distro, and its memory
> usage can be kept under control by using a small dictionary size, but we
> probably don’t want to slow down the install too much just for the sake
> of smaller packages.

Oh right, realized that after having sent the mail and checking around
a bit. I guess the problem is that embedded is a too wide target. So
some systems with low disk space and reasonable memory might truly
benefit from it but ones with the inverse might not.

> One can indeed read the amount of memory from the file headers.
> Unfortunately, the maximum dictionary size is 4 GiB, and I would think
> using 4 GiB of memory to unpack a package even if that’s available would
> be bad behavior for dpkg.  It is not obvious that examining the contents
> of an untrusted package should be considered an unsafe operation (on a
> server where this could lead to denial of service, for example).

If the package is untrusted then you should better not be installing it
anyway.

> > OTOH if the package is out of spec we can do whatever we want, but I'd
> > rather make dpkg cope with such packages gracefully.
> 
> Agreed.

Just to be clear, what I meant was that if it's not going to be
possible at all to extract it anyway, it should abort up-front in a
controlled way, and not just getting an ENOMEM in the middle of the
unpacking.

regards,
guillem


Reply to: