Re: xz support in dpkg (was Re: dpkg plans for the squeeze cycle)
On Wed, 2009-09-30 at 19:19:01 -0500, Jonathan Nieder wrote:
> Guillem Jover wrote:
> > I guess a better question is, how much benefit a bigger dictionary size
> > would give us?
> Good question. Fedora people have been recently considering a similar
> question (they’re focused on speed rather than memory usage, but still
> it comes down to dictionary size versus compression ratio).
> we can conclude that once the dictionary is larger than the payload it
> doesn’t win us much. ;)
> <http://www.advogato.org/person/badger/diary/80.html> we can conclude
> that with a reasonably sized and somewhat formulaic text file (an SQL
> database dump), preset -3 is good enough. That’s a dictionary size
> of 1 MiB.
> For deciding on limits, it would probably be good to experiment with actual
> “worst case” Debian packages (maybe openoffice.org).
I was checking a bit, and found lzip, and its companion lzlib, which
seems to have a pretty straight forward API (both packaged in Debian):
And this thread with some comparisons (although against an old lzip
version) and a link to a blog post with interesting points in favor
of it instead of xz:
I also found this about the xz endianness problem, but it seems to have
been fixed already upstream:
So it would be nice to consider it as well.
> > We can try to specify it, and codify it in the tools, but there's people
> > out there building packages with ar and tar...
> Yes, dpkg should not break this way of working.
Well, creation of packages that way should not be encouraged either.
> > > Related question: If an LZMA-based file format might ever be used
> > > for udebs, what are the memory constraints for unpacking those?
> > Well this is outside the scope of dpkg itself, and more a project wide
> > decision, but I'm not sure we'd want any package in the base system
> > built with anything but gzip, as that's shared by derivatives,
> > embedded distros, etc. xz should probably be used for big packages that
> > are guaranteed to be used on desktops or huge boxes (think games,
> > openoffice.org, etc).
> Makes sense. xz was developed for an embedded distro, and its memory
> usage can be kept under control by using a small dictionary size, but we
> probably don’t want to slow down the install too much just for the sake
> of smaller packages.
Oh right, realized that after having sent the mail and checking around
a bit. I guess the problem is that embedded is a too wide target. So
some systems with low disk space and reasonable memory might truly
benefit from it but ones with the inverse might not.
> One can indeed read the amount of memory from the file headers.
> Unfortunately, the maximum dictionary size is 4 GiB, and I would think
> using 4 GiB of memory to unpack a package even if that’s available would
> be bad behavior for dpkg. It is not obvious that examining the contents
> of an untrusted package should be considered an unsafe operation (on a
> server where this could lead to denial of service, for example).
If the package is untrusted then you should better not be installing it
> > OTOH if the package is out of spec we can do whatever we want, but I'd
> > rather make dpkg cope with such packages gracefully.
Just to be clear, what I meant was that if it's not going to be
possible at all to extract it anyway, it should abort up-front in a
controlled way, and not just getting an ENOMEM in the middle of the