[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Adding support for LZIP to dpkg, using that instead of xz, archive wide



Steve Langasek wrote:
No.  Computer science is mathematics.  Algorithms are mathematics.  Software
is something else.  You cannot "prove" that a customer's priorities are
wrong.

Debian is not the customer, but the developer. It is compelled by its social contract to provide high-quality materials to the real customers, its users. This implies rejecting low-quality materials like xz. It would be a shame for Debian if only one of its users in a remote location, without internet connection, were unable to install a package he needs just because the package format is more fragile than it could be.


I am writing here because I want you (Debian) to stop spreading FUD
against lzip, like ".lz only supports CRC32" (implying that lzip
integrity is weak), or gratuitously affirming that ".xz is superior
to .lz". I am still waiting for anybody in this list to tell us in
what aspect is .xz superior to .lz.

Please point us to where Debian is making these statements.

In this same list. Guillem Jover made these statements[1]. Then Russ Allbery affirmed that xz was chosen by consensus[2], and nobody objeted. So this seems to be the collective opinion of the Debian developers, transmitted to all readers of this list. The fact that lzip has been rejected three times even for .orig.tar files seems to corroborate it. I feel somewhat insulted every time a source tarball of mine is recompressed to a larger size into a worse format.

[1] http://lists.debian.org/debian-devel/2015/06/msg00188.html
[2] http://lists.debian.org/debian-devel/2015/07/msg00634.html


you are insisting that Debian should accept your position that lzip is
superior,

I didn't say that lzip is superior, but I agree because it is clear that xz is substandard.

I have learned here that xz violates its own format specification to overcome a bug in the format, causing silent data loss in the process. I knew the xz format had a lot of flaws, but I hadn't considered that the xz tool could make them worse. I have learned some things in this thread. Most of them new faults in xz.

The xz format has many unjustified features, each one wasting space, reducing efficiency and adding fragility. Lets see just a couple examples:

The 4 byte alignment of the fields in the format requires useless padding. Alignment is justified as being perhaps able to increase speed and compression ratio, but:

1) The only last filter in xz is LZMA2, which does not need any alignment. Xz decompresses a 2% slower than lzip in the i586 from which I'm writing this. If xz decompresses faster than lzip in your machine it is because xz uses optimized assembler on some architectures, not because of the alignment.

2) The output of the non-last filters in the chain is not stored in the file. Therefore it can't be "later compressed with an external compression tool" as stated in the specification.

3) 4 bytes are not enough. The IA64 filter has an alignment of 16 bytes. Alignment should be a property of each filter, not of the whole stream.

Conclusion: the 4 byte alignment is a misfeature that wastes space and adds fragility without producing any benefit at all.

Xz is a fantasy based on the false idea that better compression algorithms can be mass-produced like cars in a factory. The xz format is the most wasteful I have ever analyzed. It has room for 2^63 filters, which can then be combined to make an even larger number of algorithms. It reserves less than 0.8% of filter IDs for custom filters, but even this small range provides about 8 million custom filter IDs for each human inhabitant on earth.

The basic ideas of compression algorithms were discovered early in the history of computer science. LZMA is based on ideas discovered in the 70s. Don't expect an algorithm much better than LZMA to appear anytime soon, much less several of them in a row.

In almost 7 years not even one of the promises of xz has been fulfilled. Lasse Collin once warned me that lzip would become stuck with LZMA while others moved to LZMA2, LZMA3, LZMH, and other algorithms. Now xz is not even able to match the compression ratio of lzip. Who wants a compressor with a format orders of magnitude more complex that is not even able to compress a little better?

Castles in the air like xz should be eradicated from posix systems as bad practice. Studying the xz format I have understood much better the quote from Hoare.


and you have asserted that Debian should drop xz and adopt lzip.
Denying that you have done this does nothing to help you appear more
reasonable.

I don't deny it. I think Debian should drop xz, or else Debian should remove the term "high-quality" from its social contract just as the statement about "Truly free software" has been (IMHO correctly) removed from the Debian definition of Free Software.

I think Debian should adopt lzip not only for the reasons already exposed in this thread, but also because, surprisingly enough, it would be much easier for lzip to seamlessly incorporate a new compression algorithm than it would be for xz. The xz format lacks a format version field. The only reliable way of knowing if your xz tool can decompress a given file is by trial and error:

$ file COPYING.*
COPYING.lz: lzip compressed data, version: 1
COPYING.xz: XZ compressed data

It is true that the first byte of "stream flags" could be used in the future to indicate a new stream version, but this says nothing about what filters are used in a given file. Trial and error are still needed.

It is also true that "xz -vv --list" reports the minimum xz version required to decompress the file, but:

1) The minimum xz version reported is just a guess. As stated above, the xz format lacks a format version.

2) Only older versions of xz utils can be reported. If a newer version of xz utils is required, it can't be known which one. The report is also useless to know what version of other decompressors (for example 7zip) could decompress the file.

Of course a much better algorithm will probably never be found. But if it happens, lzip is better prepared for it than xz.

Adopting lzip means some work for the Debian developers, but if the Debian developers are proud of their work, they would be willing to do wathever work it takes to deliver the high-quality promised in the social contract. Even more so if the cause of this extra work is that xz was not properly evaluated before being adopted in Debian.

I am willing to help. I have reported here just a small subset of the defects of xz in order to keep this message small, but I'll gladly answer any question from Debian developers in this list, in the lzip list or in private. I'll also gladly implement lzip support in any Debian tool that may need it. I am the author of zutils[3] and have experience with multiformat support in tools.

[3] http://www.nongnu.org/zutils/zutils.html


Thanks for bringing this to our attention.  This is not an official position
of the Debian project; I've reported this as a bug:
http://bugs.debian.org/794116

You are welcome.


Best regards,
Antonio.


Reply to: