Re: xz support in dpkg (was Re: dpkg plans for the squeeze cycle)
[I already sent this message last night, but I fear it may not have been
delivered. Resending without patches attached. Apologies for the noise.]
Guillem Jover wrote:
> On Tue, 2009-09-22 at 01:40:20 -0500, Jonathan Nieder wrote:
> > > 1. Find out how much RAM it is OK to use for decompressing packages.
> > > This may vary between architectures. I hope this would be at
> > > least 10 MiB, but 20 MiB would be very nice.
> I guess a better question is, how much benefit a bigger dictionary size
> would give us?
Good question. Fedora people have been recently considering a similar
question (they’re focused on speed rather than memory usage, but still
it comes down to dictionary size versus compression ratio).
we can conclude that once the dictionary is larger than the payload it
doesn’t win us much. ;)
<http://www.advogato.org/person/badger/diary/80.html> we can conclude
that with a reasonably sized and somewhat formulaic text file (an SQL
database dump), preset -3 is good enough. That’s a dictionary size
of 1 MiB.
For deciding on limits, it would probably be good to experiment with actual
“worst case” Debian packages (maybe openoffice.org).
> We can try to specify it, and codify it in the tools, but there's people
> out there building packages with ar and tar...
Yes, dpkg should not break this way of working.
> > Related question: If an LZMA-based file format might ever be used
> > for udebs, what are the memory constraints for unpacking those?
> Well this is outside the scope of dpkg itself, and more a project wide
> decision, but I'm not sure we'd want any package in the base system
> built with anything but gzip, as that's shared by derivatives,
> embedded distros, etc. xz should probably be used for big packages that
> are guaranteed to be used on desktops or huge boxes (think games,
> openoffice.org, etc).
Makes sense. xz was developed for an embedded distro, and its memory
usage can be kept under control by using a small dictionary size, but we
probably don’t want to slow down the install too much just for the sake
of smaller packages.
> > b. Some users might build their own packages that are compressed with
> > too large a dictionary size. How should dpkg deal with these?
> > Usually, reporting the error and a suggested memlimit and providing
> > a command-line option to increase the limit would be good enough,
> > but what about when dpkg is invoked through a front-end? Should
> > there be an environment variable to set the memory limit, or do the
> > front-ends all have easy-to-find facilities for passing extra options
> > to dpkg?
> This does not seem very nice. Is there no way to know how much memory
> will we need from the file headers? If we could do that dynamically
> that'd be great. Is there a maximum dictionary size?
One can indeed read the amount of memory from the file headers.
Unfortunately, the maximum dictionary size is 4 GiB, and I would think
using 4 GiB of memory to unpack a package even if that’s available would
be bad behavior for dpkg. It is not obvious that examining the contents
of an untrusted package should be considered an unsafe operation (on a
server where this could lead to denial of service, for example).
The point of setting a memory usage limit within liblzma instead of
setting an rlimit and waiting for ENOMEM is that it can read the file
header and tell you how much higher the limit has to be right away.
I haven’t written code to use that facility yet, though.
> OTOH if the package is out of spec we can do whatever we want, but I'd
> rather make dpkg cope with such packages gracefully.