[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Request for review: xz-utils package description



Ben Finney wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:

>> Strictly speaking, the XZ format does not use LZMA1, so this is more
>> misleading than the previous formulation.
> 
> Well, that's new in this discussion. What's LZMA1, and how is it
> different from LZMA?

Sorry, I was using jargon.  The terminology here can be confusing, and
it is probably best to shelter the reader from it. :)

Thus, if you are not morbidly curious, I encourage you to skip the
rest of this message.  But in case you are...

| /**
|  * \brief       LZMA1 Filter ID
|  *
|  * LZMA1 is the very same thing as what was called just LZMA in LZMA Utils,
|  * 7-Zip, and LZMA SDK. It's called LZMA1 here to prevent developers from
|  * accidentally using LZMA when they actually want LZMA2.
|  *
|  * LZMA1 shouldn't be used for new applications unless you _really_ know
|  * what you are doing. LZMA2 is almost always a better choice.
|  */
| #define LZMA_FILTER_LZMA1       LZMA_VLI_C(0x4000000000000001)
| 
| /**
|  * \brief       LZMA2 Filter ID
|  *
|  * Usually you want this instead of LZMA1. Compared to LZMA1, LZMA2 adds
|  * support for LZMA_SYNC_FLUSH, uncompressed chunks (smaller expansion
|  * when trying to compress uncompressible data), possibility to change
|  * lc/lp/pb in the middle of encoding, and some other internal improvements.
|  */
| #define LZMA_FILTER_LZMA2       LZMA_VLI_C(0x21)

So:

 - LZMA2 is a variation on the standard LZMA compression method and
   stream format, with a few advantages as listed above.

 - The XZ format is a new container format, i.e. a format for files that
   include compressed streams plus some additional metadata.  The XZ
   format supports LZMA2 and a few simpler compression methods, and it
   even allows chaining together a few compression methods ("filters")
   in a sort of pipeline.

 - The xz program and liblzma library support three container formats:
   XZ, the old LZMA format, and raw streams; and several compression
   methods: LZMA1, LZMA2, branch/call/jump filters to compress
   binaries for various machine architectures, bytewise delta filter
   to compress uncompressed audio and bitmap images. 

LZMA2 and LZMA1 are very similar.  It is not clear whether it is fair
to say they use the same algorithm, since they do not produce
byte-for-byte identical results.  Their implementations certainly
share a lot of code.  Luckily, this philosophical issue should be
pretty much irrelevant to a someone evaluating whether to use the
xz-utils package.

xz uses the XZ container format with the LZMA2 compression method by
default, so that’s what “XZ format” ends up meaning in practice.

Sorry for the confusion,
Jonathan


Reply to: