[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: multi threaded support for xz



Hi!

On Wed, 2016-10-05 at 22:00:26 +0200, Sebastian Andrzej Siewior wrote:
> xz-utils 5.2.2 with threading support for the compressor is currently
> in the deferred queue for another 24 hours [0].
> Once this version has been built a binNMU of dpkg will pick up the
> threading support.

Please do not request any binNMU, I'm planning on doing a release
soon, but before that I'll be investigating the effects of the new
xz-utils package.

> dpkg will the use the number of online CPUs for
> compression [1] in a "dpkg-deb -b" invocation. Using more CPUs here
> increases the required amount of memory. If the buildds start running
> out of memory during dpkg-deb or start swapping - this might due to this
> change.
> There is lzma_stream_encoder_mt_memusage() which could be used to
> compute the needed memory upfront and then maybe decrease the number of
> selected CPUs while the memory limit is exceeded [2]. Also
> dpkg-buildpackage's -j argument could be used as the initial hint
> instead of number of online CPUs.

Right, I think this should be configurable. Of course the usual
problem with dpkg-deb is that debian/rules is the one invoking it, so
the only way to control its behavior is via environment variables
or configuration files which in many cases seem very inappropriate. :(

I'll check what can be done.

> Just some thoughts in case something goes wrong :)

Thanks for the heads-up!

> [0] https://ftp-master.debian.org/deferred.html

(BTW, it's customary when doing NMUs for new upstream versions to use the
release -0.1 so that we do not take over the maintainer -1 release.)

> [1] https://sources.debian.net/src/dpkg/1.18.10/lib/dpkg/compress.c/#L534
> [2] https://git.breakpoint.cc/cgit/bigeasy/xz-utils-debian.git/tree/src/xz/coder.c#n273

On Thu, 2016-10-06 at 08:30:53 +0200, Sebastian Andrzej Siewior wrote:
> On 2016-10-06 02:50:00 [+0000], HW42 wrote:
> > Is the new multi-thread compressor reproducible? I.e. does it produce
> > the same output regardless of the number of CPUs, the CPU speed, system
> > load, etc.? (A very quick look at the source suggest that this is not
> > the case but I might be totally mistaken)
> 
> With one CPU you have one block. With multiple CPUs the default block
> size (as of current xz) is dictionary size * three. So it is
> reproducible as long as you use one or multiple CPUs.
> In order to have the same compressed archive with one or multiple CPUs
> you would need a switch / environment variable to disable the use of
> multiple CPUs.

Does this depend on the encoder interface being used? Because dpkg will
always use the lzma_stream_encoder_mt() call regardless of the number
of online CPUs compared to xz(1) which changes inerface on single or
multi-threaded mode. In any case I'll be testing the repoducibility
of this, and if need be check with xz upstream to get a more clear
picture (either that or perform some code diving :).

Thanks,
Guillem


Reply to: