[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: RFS: lbzip2



Paul Wise wrote:
> On Fri, Feb 13, 2009 at 9:59 PM, ERSEK Laszlo <lacos@elte.hu> wrote:
> 
>> Citing the "Version", "File size [B]" and "Decompr. speed [%]" blocks from
>> "report_alpha_osf1.txt" (I mark the relevant rows with exlamation marks
>> now):
> 
> Ok, these numbers convinced me lbzip2 is better.

... for this task. Please see below.


> If you could make getting the lbzip2 design (since it seems better
> than pbzip2, please evaluate the other ones too) for bzip2 parallelism
> into bzip2 itself your long-term goal, it would be great. Until
> you/someone does so, lets add lbzip2 to Debian.

I. Thanks!


II. There is no clear "winner". What follows is a possibly subjective
evaluation of some programs supporting bzip2 compression and optionally
decompression. Any inaccuracies are incidental and not intentional; I'm
not trying to misrepresent anything. I'm definitely not trying to spread
FUD. Please correct any mistakes and supplement any omissions you
notice. My weighing of (= assignment of importance to) statements I
consider facts may not overlap with yours.


1. bzip2    - http://www.bzip.org/

2. bzip2smp - http://bzip2smp.sourceforge.net/

3. smpbzip2 - http://home.student.utwente.nl/n.werensteijn/smpbzip2/

4. dbzip2   - http://www.mediawiki.org/wiki/Dbzip2
            - http://svn.wikimedia.org/svnroot/mediawiki/trunk/dbzip2

5. p7zip    - http://p7zip.sourceforge.net/

6. pbzip2   - http://compression.ca/pbzip2/

7. lbzip2   - http://phptest11.atw.hu/


II.1. bzip2 (1.0.5) is the baseline program. Everybody uses it. It is
available as a library too. It uses a single thread. It is heavily
cache-optimized, supports many command line options, writes a single
bzip2 stream (with a real combined CRC at the stream's end) when
compressing, and is able to decompress concatenated bzip2 streams.


II.2. bzip2smp (1.0) only supports compression. From the homepage:

	No decompression is supported.

Its license says:

        This program, "bzip2smp", is copyright (c) 2005 Konstantin
        Isakov. It is based on the modified "libbzip2" library, part of
        the "bzip2" program, version 1.0.2. The source was taken from
        the Debian source package, 1.0.2-10, so it contains some Debian
        patches, too. This program is licensed under the same conditions
        as the original "bzip2" and "libbzip2".

While it may come with the additional benefit that it parallelizes the
library itself (see more on that below), it effectively forks libbz2.
This is a problem before my eyes. Julian Seward released bzip2-1.0.5
with a security vulnerabilty fix. Perhaps bzip2smp-1.0 contains this fix
(although it may be irrelevant since the vuln is decompression related
AFAIK and bzip2smp only supports compression), but in my opinion,
bzip2smp didn't leave the core compression implementation intact, and
all future fixes (if any) from Julian Seward would need to be merged
into bzip2smp with extra effort.

Some research (PDF:
http://digbib.ubka.uni-karlsruhe.de/volltexte/documents/614524) also
shows that low-level parallelization doesn't pay off very well in
general. (I'm not sure if this applies at all to bzip2smp, I didn't
check its source code. Also note that to my knowledge, no implementation
mentioned in the study was released to the public.)

Then, the bzip2smp homepage says,

        The resulting archives are bit-by-bit identical to the ones
        produced by the normal bzip2, at least as of version 1.0.2.

I believe this must be a bottleneck. Even if you take a different
approach and compress separate input blocks in parallel, recombining the
compressed blocks into a single bzip2 stream (by a lot of bit-shifting)
is very CPU-hungry. If you advance block by block and parallelize
intra-block stuff, then you're simply not using the possibility of *not*
shifting output blocks together. (Or something like that.)

As a side note, it's great that bzip2smp's homepage emphasizes bzip2's
cache-sensitivity:

        There is NO speedup coming from hyperthreading on the
        hyperthreaded machines, since hyperthreads don't have dedicated
        caches, and the bzip2 is very cache-dependent. Expect degraded
        performance if you try utilizing hyperthreads.

This holds for lbzip2, too.

Conclusion: bzip2smp doesn't support decompression and modifies the core
library.

I never ran bzip2smp (or can't remember).


II.3. smpbzip2 (0.2) doesn't support multithreaded decompression. From
the README:

        Currently,only the compression stage is multithreaded.
        Decompression is still single threaded.

It also seems to contain the whole bzip2 library, version 1.0.2. I
downloaded
http://snapshot.debian.net/archive/2005/03/13/debian/pool/main/b/bzip2/bzip2_1.0.2.orig.tar.gz
and diffed it against smpbzip2. At a cursory glance, smpbzip2 modifies
BZ2_blockSort() in blocksort.c, thus I consider it a fork of bzip2.

Conclusion: smpbzip2 doesn't support MT-decompression, and modifies the
core library.

I never ran smpbzip2 (or can't remember).


II.4. (Mediawiki's Dbzip2 entry also contains a tabular comparison
between (some of) these programs; I modified that section too.) As far
as I know, dbzip2 didn't see a public release yet. From
http://www.mediawiki.org/wiki/Dbzip2#Development_status :

        dbzip2 is not ready for public use yet. The current prototype is
        written in Python and C, using the standard bzip2 library to do
        the heavy lifting.

However, it doesn't seem to support MT-decompression. From
http://svn.wikimedia.org/svnroot/mediawiki/trunk/dbzip2/README :

        Options:

          [...]

          -d
            [experimental] Decompression mode.
            Only works on small (single-block) files right now.

Furthermore, under the Development status link above, an uncrossed item
in the Checklist is:

        parallelize decompression by scanning bitstream for block
        boundaries

(This is what lbzip2 does, by the way, in a manner that doesn't make the
splitter CPU-bound, by distributing even the bit-shifting among worker
threads.)

Even further more, following the last, "update 2006-05-31" link from
below "Development status",
http://leuksman.com/log/2006/05/31/dbzip2-vincit/ says:

        Next step: parallel decompression…?

(Yes, I had to spam that blog entry too.)

Conclusion: dbzip2 isn't publicly released yet, and it claims to support
decompression only for small files.

I never ran dbzip2 (or can't remember).


II.5. Last time I checked, p7zip seemed to feature a from-the-scratch
reimplementation of the bzip2 library. When compressing, it writes out a
single bzip2 stream (with combined CRC) that is *not* bit-identical to
bzip2's compressed output. Citing from
http://lacos.web.elte.hu/pub/lbzip2/report_sparc_solaris_2.txt :

+-----------------------------------------------------------------------
|Version
|                  |bzip2                                          1.0.5
|                  |7za                                    (A) 4.55 beta
+-----------------------------------------------------------------------
|File size [B]
|                  |original                                   409600000
|                  |bzip2                                       57844970
|                  |7za                                         57841527
+-----------------------------------------------------------------------
|Compr. size [%]
|                  |7za:bzip2                                      99.99

(Yes, there are newer versions of 7za available. I never had a test
setup where both p7zip and bzip2 were newest versions and I also
succeeded to make 7za utilize multiple threads.)

The single stream compressed output might be a bottleneck, see my remark
in section "II.2. bzip2smp".

None of the checked 7za versions (4.61 beta, 4.55 beta, 4.43 beta) could
decompress from stdin. I vaguely remember trying decompression once from
a named pipe and failing. Regular files can be decompressed. While 7za
utilizes multiple worker threads for decompression too, their number
seems to be capped at 4. See
p7zip_4.65/CPP/7zip/Compress/BZip2Decoder.cpp, line 21:

        const UInt32 kNumThreadsMax = 4;


Partial conclusion: 7za seems to have its own bzip2 implementation, and
it doesn't support decompression from non-seekable input. I'll cite
speed figures later.


II.6. pbzip2 (1.0.5) has lots of features. Among others, it supports all
bzip2 command line options and exit statuses. From the ChangeLog:

        Changes in 1.0.4 (Dec 21, 2008)
        - Added support for all remaining bzip2 command line options so
          pbzip2 can be used as a drop-in replacement for bzip2.

        Changes in 1.0.2 (Jul 25, 2007)
        - Changed pbzip2 exit code behaviour to match bzip2 for all
          error states (ie: trying to compress a file that already has a
          .bz2 extension)

It supports both compression and decompression from a pipe; however,
decompression from a pipe is not multi-threaded. It supports
decompression from a single bzip2 stream, but decompression is not MT:

        Changes in 1.0.3 (Oct 31, 2008)
        - Added support for compression using stdin and pipes!  Thanks
          to Ivan Voras for supplying the patch to enable this feature.
        - Added support for decompression using stdin and pipes but
          currently limited to only a single thread
        - Added support for testing bzip2 files using stdin and pipes
        - Added support to directly decompress files without using
          threads when files are small or the system only has 1 CPU.
          This mode is also used if the .bz2 file contains only 1
          bzip2 stream.

When decompressing, pbzip2 looks for byte-aligned bzip2 stream headers
("BZh91AY&SY"). The stream header consists of the Huffman-style bz2
magic number ("BZh"), the compression block size identifier ("9" here --
if I'm not mistaken, pbzip2 parses the actual value from the beginning
of the file), and the bzip2 block header ("1AY&SY", if byte-aligned).

Theoretically (with negligible likelihood), such bit-strings can occur
within the compressed data of one bzip2 block. I don't know how pbzip2
handles this. Additionally, I'm not certain how pbzip2 handles bz2 files
that contain multiple bzip2 streams that use different block sizes.
(Maybe I didn't do my homework, sorry.)

The compressed output is a sequence of bzip2 streams, each holding a
single bzip2 block. (Libbz2 applications see multiple BZ_STREAM_ENDs
when they decompress such a file.) I believe there is one combined CRC
for each such stream, which is identical to the block CRC.

(There is no "super stream CRC" at all. Since no such thing exists,
pbzip2 (or official bzip2) cannot verify it when decompressing.)

Partial conclusion: pbzip2 is full-featured and uses libbz2. If it
supported multi-threaded decompression from a pipe and/or a single bzip2
stream, I'd have never written lbzip2. I'll cite speed figures later.


II.7. lbzip2 (0.13) is a bare-bones filter. It compresses and
decompresses from pipes, using multiple worker threads. It decompresses
single bzip2 stream input with multiple threads; block size doesn't
matter (at the price of always allocating memory for compression block
size 9).

The compressed output is similarly structured to that of pbzip2.

Just to be sure, I mention lbzip2's limitations (over what's written
above) here; each one has good reasons. Please see the README and the
manual for elaboration.

a) Original combined CRCs are never checked by the MWD (multiple-workers
decompressor), only block CRCs.

b) Exit statuses don't differentiate between files that seem to be
corrupt and other errors.

c) The MWD *may* refuse to decompress some rare valid bz2 files:
- a file that contains many empty (14 byte) bzip2 streams in a row,
- a file that contains a bit string within a bzip2 block that looks like
the 48-bit bzip2 block header or the 48-bit end-of-stream marker.

d) Only the largest compression block size is supported (899,981 bytes),
which is also used as the input block size when compressing.

e) (This is obvious.) Being a filter, lbzip2 must be invoked separately
for each file to process. This may involve significant penalty if you
compress lots of tiny files.

Unimportant for end users: lbzip2 features some self-checking stuff
(options -v and -t). As upstream author, I'd like to add that thread
broadcast conditions were formally calculated -- I found the ones
pertaining to the MWD nontrivial, and put them into the source as
comments (lbunzip2.c). Reportedly, broadcasts can be expensive, so I
wanted to be frugal. I can't quantify the saved overhead, though.

lbzip2 contains a single worker decompressor too, so that scripts
calling "lbzip2 -d" don't incur a performance penalty when copied from a
multi-core machine to single-core one. (The MWD does work with a single
worker thread, but it's slower than the SWD.)

There is only one type of compressor. IIRC, preliminary single-threaded
tests showed that its time overhead is insignificant in comparison to
bzip2. (Size overhead is discussed below.)

Partial conclusion: fundamentally, lbzip2 was created to fill a
performance gap left by pbzip2. I'll cite speed figures later.


III. Now something on performance. The reports referred to below are
available from http://lacos.web.elte.hu/pub/lbzip2 . Note that each test
report presumes a *correctness* test for lbzip2. I won't cite
everything, just show that in these tests,

1. the compression efficiency of bzip2, lbzip2, pbzip2 and 7za is
roughly equivalent (specifying default options),

2. the compression speed of pbzip2 and lbzip2 is roughly equivalent
(specifying default options),

3. the decompression "winner" varies by
a) the availability of the compressed input (regular file / pipe),
b) the producer of the compressed input (single-stream (bzip2, 7za) /
multi-stream (pbzip2, lbzip2)),
c) the number of cores.

Missing programs in reports mean that the program in question wasn't
installed. A missing value means that the test run ended with an error.

I won't talk very much about scaling, you can see for yourself. As an
anecdote I mention that Bryan Stillwell tested lbzip2 on a 128-core HP
Superdome and saw a speedup factor of 108x for compression. You believe
it if you want to. I cannot show a nice report because this happened
before I created/finished test.sh.

pbzip2 also scales nearly linearly when compressing, see
http://compression.ca/pbzip2/ .

I'd be very grateful if somebody could (re)test lbzip2 (with tesh.sh) on
a many-core SMP machine. You hear me, Bryan? :)


III.1. The compression efficiency of bzip2, lbzip2, pbzip2 and 7za is
roughly equivalent (specifying default options):


report_alpha_osf1.txt
+-----------------------------------------------------------------------
|Version
|                  |bzip2                                          1.0.5
|                  |lbzip2                                          0.13
|                  |pbzip2                                         1.0.5
|                  |7za                                    (A) 4.61 beta
+-----------------------------------------------------------------------
|File size [B]
|                  |original                                   666681476
|                  |bzip2                                       82731810
|                  |lbzip2                                      82726254
|                  |pbzip2                                      82738316
|                  |7za                                         82738302
+-----------------------------------------------------------------------
|Compr. size [%]
|                  |lbzip2:bzip2                                   99.99
|                  |pbzip2:bzip2                                  100.00
|                  |7za:bzip2                                     100.00


report_sparc_solaris.txt
+-----------------------------------------------------------------------
|Version
|                  |bzip2                                          1.0.2
|                  |lbzip2                                          0.12
|                  |pbzip2                                         1.0.5
|                  |7za                                    (A) 4.61 beta
+-----------------------------------------------------------------------
|File size [B]
|                  |original                                   400303532
|                  |bzip2                                       12252934
|                  |lbzip2                                      12977446
|                  |pbzip2                                      12977608
|                  |7za                                         12248546
+-----------------------------------------------------------------------
|Compr. size [%]
|                  |lbzip2:bzip2                                  105.91
|                  |pbzip2:bzip2                                  105.91
|                  |7za:bzip2                                      99.96


report_sparc_solaris_2.txt, submitted by Imre Csatlós
+-----------------------------------------------------------------------
|Version
|                  |bzip2                                          1.0.5
|                  |lbzip2                                          0.13
|                  |7za                                    (A) 4.55 beta
+-----------------------------------------------------------------------
|File size [B]
|                  |original                                   409600000
|                  |bzip2                                       57844970
|                  |lbzip2                                      58152349
|                  |7za                                         57841527
+-----------------------------------------------------------------------
|Compr. size [%]
|                  |lbzip2:bzip2                                  100.53
|                  |7za:bzip2                                      99.99


report_x86_linux.txt
+-----------------------------------------------------------------------
|Version
|                  |bzip2                                          1.0.3
|                  |lbzip2                                          0.12
|                  |pbzip2                                         1.0.5
|                  |7za                                    (A) 4.43 beta
+-----------------------------------------------------------------------
|File size [B]
|                  |original                                   885656576
|                  |bzip2                                       81193819
|                  |lbzip2                                      82196198
|                  |pbzip2                                      82159065
|                  |7za                                         81205299
+-----------------------------------------------------------------------
|Compr. size [%]
|                  |lbzip2:bzip2                                  101.23
|                  |pbzip2:bzip2                                  101.18
|                  |7za:bzip2                                     100.01


report_x86_linux_2.txt (three tests), submitted by Zsolt Bartos-Elekes
+-----------------------------------------------------------------------
|Version
|                  |bzip2                                          1.0.3
|                  |lbzip2                                          0.13
|                  |pbzip2                                         0.9.6
+-----------------------------------------------------------------------
|File size [B]
|                  |original                                  1000000000
|                  |bzip2                                     1004420783
|                  |lbzip2                                    1004432842
|                  |pbzip2                                    1004499436
+-----------------------------------------------------------------------
|Compr. size [%]
|                  |lbzip2:bzip2                                  100.00
|                  |pbzip2:bzip2                                  100.00


+-----------------------------------------------------------------------
|File size [B]
|                  |original                                  1000000000
|                  |bzip2                                      752629184
|                  |lbzip2                                     752671101
|                  |pbzip2                                     752682155
+-----------------------------------------------------------------------
|Compr. size [%]
|                  |lbzip2:bzip2                                  100.00
|                  |pbzip2:bzip2                                  100.00


+-----------------------------------------------------------------------
|File size [B]
|                  |original                                  1795829760
|                  |bzip2                                      395558652
|                  |lbzip2                                     396907618
|                  |pbzip2                                     396923004
+-----------------------------------------------------------------------
|Compr. size [%]
|                  |lbzip2:bzip2                                  100.34
|                  |pbzip2:bzip2                                  100.34


So compress with whichever you like. pbzip2 may be tunable.


III.2. The compression speed of pbzip2 and lbzip2 is roughly equivalent
(specifying default options) -- for versions, look above:

report_alpha_osf1.txt (8 CPUs, 1 core/CPU, 1 hwthread/core)
+-----------------------------------------------------------------------
|Compr. speed [%]
|                  +----------------------------------------------------
|                  |from regf
|                  |            |lbzip2:bzip2                     625.77
|                  |            |pbzip2:bzip2                     601.90
|                  |            |7za:bzip2                         74.52
|                  +----------------------------------------------------
|                  |from pipe
|                  |            |lbzip2:bzip2                     559.72
|                  |            |pbzip2:bzip2                     609.14
|                  |            |7za:bzip2                         76.24

(Maybe I miscompiled 7za somehow?...)


report_sparc_solaris.txt (4 CPUs, 1 core/CPU, 1 hwthread/core)
+-----------------------------------------------------------------------
|Compr. speed [%]
|                  +----------------------------------------------------
|                  |from regf
|                  |            |lbzip2:bzip2                     405.62
|                  |            |pbzip2:bzip2                     405.72
|                  |            |7za:bzip2                        302.13
|                  +----------------------------------------------------
|                  |from pipe
|                  |            |lbzip2:bzip2                     403.40
|                  |            |pbzip2:bzip2                     401.40
|                  |            |7za:bzip2                        300.23


report_sparc_solaris_2.txt (8 CPUs, 4 cores/CPU, 2 hwthreads/core)
+-----------------------------------------------------------------------
|Compr. speed [%]
|                  +----------------------------------------------------
|                  |from regf
|                  |            |lbzip2:bzip2                     774.05
|                  |            |7za:bzip2                        147.22
|                  +----------------------------------------------------
|                  |from pipe
|                  |            |lbzip2:bzip2                     654.20
|                  |            |7za:bzip2                        143.67

(This is an OK result for lbzip2 from the scaling viewpoint if I look at
the 8 CPUs, but a very bad result if I look at the 64 hardware threads.
I really don't know what to make of this. pbzip2 wasn't installed.)


report_x86_linux.txt (1 CPU, 2 cores/CPU, 1 hwthread/core)
+-----------------------------------------------------------------------
|Compr. speed [%]
|                  +----------------------------------------------------
|                  |from regf
|                  |            |lbzip2:bzip2                     192.18
|                  |            |pbzip2:bzip2                     191.72
|                  |            |7za:bzip2                        137.70
|                  +----------------------------------------------------
|                  |from pipe
|                  |            |lbzip2:bzip2                     190.39
|                  |            |pbzip2:bzip2                     191.27
|                  |            |7za:bzip2                        137.72


report_x86_linux_2.txt, three tests (16 CPUs, 1 core/CPU, ? hwthreads/core)
+-----------------------------------------------------------------------
|Compr. speed [%]
|                  +----------------------------------------------------
|                  |from regf
|                  |            |lbzip2:bzip2                    1158.21
|                  |            |pbzip2:bzip2                    1200.16
|                  +----------------------------------------------------
|                  |from pipe
|                  |            |lbzip2:bzip2                    1138.41
|                  |            |pbzip2:bzip2

(pbzip2 is an old version, not yet supporting compression from a pipe.)


+-----------------------------------------------------------------------
|Compr. speed [%]
|                  +----------------------------------------------------
|                  |from regf
|                  |            |lbzip2:bzip2                    1169.72
|                  |            |pbzip2:bzip2                    1142.81
|                  +----------------------------------------------------
|                  |from pipe
|                  |            |lbzip2:bzip2                    1073.39
|                  |            |pbzip2:bzip2


+-----------------------------------------------------------------------
|Compr. speed [%]
|                  +----------------------------------------------------
|                  |from regf
|                  |            |lbzip2:bzip2                    1285.85
|                  |            |pbzip2:bzip2                    1266.19
|                  +----------------------------------------------------
|                  |from pipe
|                  |            |lbzip2:bzip2                    1079.25
|                  |            |pbzip2:bzip2


So compress with whichever you like. pbzip2 may be tunable.


III.3. The decompression "winner" varies by
a) the availability of the compressed input (regular file / pipe),
b) the producer of the compressed input (single-stream (bzip2, 7za) /
multi-stream (pbzip2, lbzip2)),
c) the number of cores.


report_alpha_osf1.txt (8 CPUs, 1 core/CPU, 1 hwthread/core)
+-----------------------------------------------------------------------
|Decompr. speed [%]
|                  +----------------------------------------------------
|                  |from regf
|                  |            +---------------------------------------
|                  |            |from bzip2
|                  |            |            |lbzip2:bzip2        475.55
|                  |            |            |pbzip2:bzip2         95.25
|                  |            |            |7za:bzip2           106.64
|                  |            +---------------------------------------
|                  |            |from lbzip2
|                  |            |            |lbzip2:bzip2        408.75
|                  |            |            |pbzip2:bzip2        495.45
|                  |            |            |7za:bzip2           107.92
|                  |            +---------------------------------------
|                  |            |from pbzip2
|                  |            |            |lbzip2:bzip2        444.89
|                  |            |            |pbzip2:bzip2        480.88
|                  |            |            |7za:bzip2           105.14
|                  |            +---------------------------------------
|                  |            |from 7za
|                  |            |            |lbzip2:bzip2        430.87
|                  |            |            |pbzip2:bzip2         96.10
|                  |            |            |7za:bzip2           107.00
|                  +----------------------------------------------------
|                  |from pipe
|                  |            +---------------------------------------
|                  |            |from bzip2
|                  |            |            |lbzip2:bzip2        530.40
|                  |            |            |pbzip2:bzip2        100.30
|                  |            |            |7za:bzip2
|                  |            +---------------------------------------
|                  |            |from lbzip2
|                  |            |            |lbzip2:bzip2        507.81
|                  |            |            |pbzip2:bzip2        100.77
|                  |            |            |7za:bzip2
|                  |            +---------------------------------------
|                  |            |from pbzip2
|                  |            |            |lbzip2:bzip2        506.29
|                  |            |            |pbzip2:bzip2         99.22
|                  |            |            |7za:bzip2
|                  |            +---------------------------------------
|                  |            |from 7za
|                  |            |            |lbzip2:bzip2        537.09
|                  |            |            |pbzip2:bzip2        102.30
|                  |            |            |7za:bzip2

lbzip2 wins when decompressing from a pipe, no matter the producer, and
also when the compressed input coming from a regular file is single stream.

pbzip2 beats lbzip2 when the compressed input is coming from a regular
file and is multi-stream. (Yes, pbzip2 can decompress even lbzip2's
compressed output faster than lbzip2 itself, when it's coming from a
regular file.)

p7zip (7za) doesn't support decompression from a pipe.


report_sparc_solaris.txt (4 CPUs, 1 core/CPU, 1 hwthread/core)
+-----------------------------------------------------------------------
|Decompr. speed [%]
|                  +----------------------------------------------------
|                  |from regf
|                  |            +---------------------------------------
|                  |            |from bzip2
|                  |            |            |lbzip2:bzip2        322.89
|                  |            |            |pbzip2:bzip2         98.51
|                  |            |            |7za:bzip2           144.36
|                  |            +---------------------------------------
|                  |            |from lbzip2
|                  |            |            |lbzip2:bzip2        331.71
|                  |            |            |pbzip2:bzip2        366.42
|                  |            |            |7za:bzip2            84.93
|                  |            +---------------------------------------
|                  |            |from pbzip2
|                  |            |            |lbzip2:bzip2        325.54
|                  |            |            |pbzip2:bzip2        372.66
|                  |            |            |7za:bzip2            85.70
|                  |            +---------------------------------------
|                  |            |from 7za
|                  |            |            |lbzip2:bzip2        325.57
|                  |            |            |pbzip2:bzip2         99.68
|                  |            |            |7za:bzip2           148.91
|                  +----------------------------------------------------
|                  |from pipe
|                  |            +---------------------------------------
|                  |            |from bzip2
|                  |            |            |lbzip2:bzip2        329.11
|                  |            |            |pbzip2:bzip2        100.11
|                  |            |            |7za:bzip2
|                  |            +---------------------------------------
|                  |            |from lbzip2
|                  |            |            |lbzip2:bzip2        330.45
|                  |            |            |pbzip2:bzip2         99.41
|                  |            |            |7za:bzip2
|                  |            +---------------------------------------
|                  |            |from pbzip2
|                  |            |            |lbzip2:bzip2        324.14
|                  |            |            |pbzip2:bzip2         99.88
|                  |            |            |7za:bzip2
|                  |            +---------------------------------------
|                  |            |from 7za
|                  |            |            |lbzip2:bzip2        327.23
|                  |            |            |pbzip2:bzip2         99.21
|                  |            |            |7za:bzip2

Same as above. Additionally, 7za (4.61 beta) seems to be harmed by
multi-stream compressed input.


report_sparc_solaris_2.txt (8 CPUs, 4 cores/CPU, 2 hwthreads/core)
+-----------------------------------------------------------------------
|Decompr. speed [%]
|                  +----------------------------------------------------
|                  |from regf
|                  |            +---------------------------------------
|                  |            |from bzip2
|                  |            |            |lbzip2:bzip2        419.46
|                  |            |            |7za:bzip2           205.34
|                  |            +---------------------------------------
|                  |            |from lbzip2
|                  |            |            |lbzip2:bzip2        549.35
|                  |            |            |7za:bzip2           124.03
|                  |            +---------------------------------------
|                  |            |from 7za
|                  |            |            |lbzip2:bzip2        358.69
|                  |            |            |7za:bzip2           174.11
|                  +----------------------------------------------------
|                  |from pipe
|                  |            +---------------------------------------
|                  |            |from bzip2
|                  |            |            |lbzip2:bzip2        410.00
|                  |            |            |7za:bzip2
|                  |            +---------------------------------------
|                  |            |from lbzip2
|                  |            |            |lbzip2:bzip2        415.12
|                  |            |            |7za:bzip2
|                  |            +---------------------------------------
|                  |            |from 7za
|                  |            |            |lbzip2:bzip2        398.59
|                  |            |            |7za:bzip2

This result isn't really useful here... lbzip2 beats 7za on all counts
examined here. Pbzip2 wasn't installed on the machine.


report_x86_linux.txt (1 CPU, 2 cores/CPU, 1 hwthread/core)
+-----------------------------------------------------------------------
|Decompr. speed [%]
|                  +----------------------------------------------------
|                  |from regf
|                  |            +---------------------------------------
|                  |            |from bzip2
|                  |            |            |lbzip2:bzip2        149.48
|                  |            |            |pbzip2:bzip2         99.79
|                  |            |            |7za:bzip2           180.68
|                  |            +---------------------------------------
|                  |            |from lbzip2
|                  |            |            |lbzip2:bzip2        163.32
|                  |            |            |pbzip2:bzip2        184.60
|                  |            |            |7za:bzip2           117.43
|                  |            +---------------------------------------
|                  |            |from pbzip2
|                  |            |            |lbzip2:bzip2        163.21
|                  |            |            |pbzip2:bzip2        183.83
|                  |            |            |7za:bzip2           117.47
|                  |            +---------------------------------------
|                  |            |from 7za
|                  |            |            |lbzip2:bzip2        149.76
|                  |            |            |pbzip2:bzip2         99.91
|                  |            |            |7za:bzip2           181.46
|                  +----------------------------------------------------
|                  |from pipe
|                  |            +---------------------------------------
|                  |            |from bzip2
|                  |            |            |lbzip2:bzip2        149.50
|                  |            |            |pbzip2:bzip2         99.94
|                  |            |            |7za:bzip2
|                  |            +---------------------------------------
|                  |            |from lbzip2
|                  |            |            |lbzip2:bzip2        162.84
|                  |            |            |pbzip2:bzip2         99.73
|                  |            |            |7za:bzip2
|                  |            +---------------------------------------
|                  |            |from pbzip2
|                  |            |            |lbzip2:bzip2        162.85
|                  |            |            |pbzip2:bzip2         99.84
|                  |            |            |7za:bzip2
|                  |            +---------------------------------------
|                  |            |from 7za
|                  |            |            |lbzip2:bzip2        148.27
|                  |            |            |pbzip2:bzip2         99.48
|                  |            |            |7za:bzip2

When decompressing from a pipe, lbzip2 wins, no matter the producer. On
the regular file front, however, I'm in for a nasty surprise. When
decompressing multi-stream input, pbzip2 wins, as usual. But when
decompressing single-stream compressed input, 7za beats both lbzip2 and
pbzip2!

Let this sink in... According to this test, executed on my home desktop,
I have to choose one of THREE different bzip2 decompressors, based on
the type and availability of the bz2 file!


report_x86_linux_2.txt, three tests (16 CPUs, 1 core/CPU, ? hwthreads/core)
+-----------------------------------------------------------------------
|Decompr. speed [%]
|                  +----------------------------------------------------
|                  |from regf
|                  |            +---------------------------------------
|                  |            |from bzip2
|                  |            |            |lbzip2:bzip2       1028.08
|                  |            |            |pbzip2:bzip2
|                  |            +---------------------------------------
|                  |            |from lbzip2
|                  |            |            |lbzip2:bzip2       1021.03
|                  |            |            |pbzip2:bzip2        948.22
|                  |            +---------------------------------------
|                  |            |from pbzip2
|                  |            |            |lbzip2:bzip2        984.09
|                  |            |            |pbzip2:bzip2       1038.08
|                  +----------------------------------------------------
|                  |from pipe
|                  |            +---------------------------------------
|                  |            |from bzip2
|                  |            |            |lbzip2:bzip2       1123.24
|                  |            |            |pbzip2:bzip2
|                  |            +---------------------------------------
|                  |            |from lbzip2
|                  |            |            |lbzip2:bzip2       1007.96
|                  |            |            |pbzip2:bzip2
|                  |            +---------------------------------------
|                  |            |from pbzip2
|                  |            |            |lbzip2:bzip2        941.21
|                  |            |            |pbzip2:bzip2

Decompressing from a pipe: lbzip2 wins, no matter the producer. (7za is
not installed, and pbzip2 is an old version, but not even pbzip2-1.0.5
supports MT-decompression from a pipe.)

Decompressing from a regular file: pbzip-0.9.6 refuses the single stream
compressed input (1.0.5 would have decompressed it using a single
thread). pbzip2 wins on the multi-stream regular compressed file created
by itself.

However, as another surprise, the multi-stream regular compressed file
created by lbzip2 favors lbzip2. I attribute this to one of three grounds:
(a) with so many cores, pbzip2 hit a bottleneck with its scanning phase
-- this would be the III.3.c. dependency (number of cores),
(b) pbzip2 wasn't the newest version,
(c) no general reason: the environment ("background load") changed, or
data idiosyncrasy.

I find (b) being the cause unlikely, after skimming the Changes since
0.9.6 in pbzip2's ChangeLog.


For decompression, I can't assert that pbzip2, lbzip2 and 7za are
interchangeable. I also cannot distill a constant ranking.


IV. Since I cannot declare a clear "winner", I find it would be
unreasonable to appoint any of the algorithms/implementations as the one
to add to bzip2. (For compression, it doesn't seem to matter, although
pbzip2 is the most tuneable one. For decompression, it depends.) Even if
somebody added all of them, the user would have to select the
appropriate one (based on the given use case), and once you select an
algorithm with a command line option, you may as well use a different
program.

Wrt. extending bzip2, the library: it would be a very hard task from a
technical standpoint. (In my judgement). As a utility, you can just bail
out, maybe remove some temporary files, the kernel cleans up the rest.
In a library you need to clean up after yourself when encountering an
error. (Or if the application asks you to because it catches a signal.)
It is really hard in the presence of multiple threads. (Signal handlers
are shared, signal masks are not. Add fork() to the mix. Etc.) If you
look at the librarization of (official, single-threaded) bzip2, even
that required an ingenious hack from Julian Seward (look at the source).

Wrt. extending bzip2, the utility: now this would be doable technically,
I guess. However, there's the licence issue (bzip2 is BSD-style, lbzip2
is GPLv2+), so parts of lbzip2 cannot be incorporated into bzip2
verbatim. Somebody would have to fork bzip2 and release the modified
version under the GPL.

In my view, the person who could do this (or add any other type of
parallelism to bzip2) with unquestionable credibilty is Julian Seward. I
figure he won't release bzip2 under the GPL, so if he chose lbzip2's
decompression algorithm, he would have to develop it from scratch
(although lbzip2's README describes the design). However, I have the
impression from our "correspondence" (and I'm using the word in a broad
sense) that he's awash with other work.

For example, take
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=220374 . That's an
irritating bug with a seemingly simple fix. (Well, it irritated me at
least.) Aníbal Monsalve Salazar fixed it on 09 Jan 2005 in (Debian)
version 1.0.2-3. Upstream bzip2-1.0.5 still contains the bug.

But that's just my opinion, and I'm nobody.

As sad as it sounds, I think the most fruitful thing is to evaluate the
programs on one's desktop or server, and choose the one being the best
for the specific platform and task. I'd like to think that the "test.sh"
portable shell script I provide in the lbzip2 package can be used for
this purpose. (It compares lbzip2, pbzip2 and 7za against bzip2; the
mandatory binaries being lbzip2 and bzip2.) The above reports were all
created by test.sh.

I consider all bzip2 implementations to be of transitory nature anyway.
As soon as tukaani.org comes out with a stable multi-threaded lzmautils
package, I will only use pigz and lzmautils. (I think lzmautils strives
for world domination, and rightly so -- see
http://tukaani.org/xz/xz-file-format.txt .) Pigz for lightweight & very
fast compression, and lzmautils for massive compression.

I expect lzmautils to use all cores not because of sheer speed -- in the
end, I'm using lzmautils in that actual instance because compression
ratio is primary. I simply expect it to use, during (de)compression, all
utilizable hardware resources I paid for when I bought the parts for my
machine.


> And now onto the package review:

I'll respond to this in a separate e-mail. I'm BCC-ing Julian Seward and
Jeff Gilchrist with the current one so they can drag my mistakes into
sunlight. I'm not sure whom to BCC wrt. p7zip, since it's a port.

I hope I didn't mess up too much. This isn't the shortest mail I've
written in my life.

Thanks,
lacos


Reply to: