Re: Adding support for LZIP to dpkg, using that instead of xz, archive wide
Thanks for the detailed explanations.
Russ Allbery wrote:
Inversely, by not accepting .lz source tarballs Debian is sending the
message that lzip is not a good format to use,
While it's possible that people may decide to read such messages into our
decisions, that's not really something we have control over, and I don't
believe we should alter our decisions on that basis.
IMHO, if two options are equally good for the Debian use case but one of
them is better for most of the users, Debian should choose the one that
is better for most of the users. Else, what is the use of the social
contract?
"We will be guided by the needs of our users and the free software
community. We will place their interests first in our priorities.
[...]
we will provide an integrated system of high-quality materials with no
legal restrictions that would prevent such uses of the system."
I can assure you that the first interest of most users is to not lose
their data. I am also pretty sure that neither lzma-alone nor xz count
as "high-quality materials".
(I believe the whole decision-making process for xz required a year or
two.)
In one year or two nobody noticed (or cared about) the deficient design
of xz? Wow!
I agree with what you said in another message that it is depressing to
have so many compression formats. But the bad decisions of Debian (among
others) certainly contribute to this situation. A selection process that
chooses a defective format (lzma-alone), then skips over a flawless
format that improves on gzip and bzip2 (lzip), and later replaces the
first defective format with another defective format (xz), really needs
some improvement.
It is also depressing for me to have to point out the defects in both
the xz format and the Debian decision-making process. The current
situation should never have happened. Xz should have been evaluated by
an expert and rejected outright, instead of wasting one year or two in
popularity contests just to reach the wrong decision. If high-quality is
desired, then democracy is not the best way of deciding about technical
questions. Anybody thinking that popular equals good should be using
Windows instead of bringing crappy Windows formats to posix systems.
So, there are two things in Debian that could switch to lzip, and two
separate paths forward for making an argument for those two things. I
don't think this thread is really helping with either path, and would
recommend that you focus gathering of evidence (backed by real numbers and
reproducible statistics, the way the xz discussion previously was)
I agree that this thread is not helping, but I think it is because
Debian is not the place I thought it was.
In particular I am very surprised by the insistence on statistics on
this thread. IMHO, statistics, specially of the "popularity contest"
kind, are out of place in this decision. This is an ethical and
technical decision about using in Debian packaging either:
a) a defective format implemented by public domain code that can be
easily made non-free, or
b) a flawless format implemented by truly free code that will remain
free[1].
[1] http://www.debian.org/intro/free
But I'll gladly provide some real numbers:
1) LZMA2 is just plain LZMA wrapped in yet another container format. The
variant of LZMA implemented by lzip is better than xz for
general-purpose compression (for example, of heterogeneous tarballs like
those distributed by Debian). Vincent Lefevre already provided a
summary[2] of the differences between lzip and xz. In the lzip
benchmark[3] can be found more examples from gnu.org.
[2] http://lists.debian.org/debian-devel/2015/06/msg00256.html
[3] http://www.nongnu.org/lzip/lzip_benchmark.html
Here are a couple tests from Debian packages:
-rw-r--r-- 1 2058 Jul 29 17:11 gddrescue_1.19-2_control.tar.gz
-rw-r--r-- 1 2038 Jul 29 17:11 gddrescue_1.19-2_control.tar.lz
-rw-r--r-- 1 101365 Jul 29 17:12 gddrescue_1.19-2_data.tar.lz
-rw-r--r-- 1 101608 Jul 29 17:12 gddrescue_1.19-2_data.tar.xz
-rw-r--r-- 1 1124 Jul 29 17:04 lzip_1.17-1_control.tar.gz
-rw-r--r-- 1 1125 Jul 29 17:04 lzip_1.17-1_control.tar.lz
-rw-r--r-- 1 70343 Jul 29 17:05 lzip_1.17-1_data.tar.lz
-rw-r--r-- 1 70440 Jul 29 17:05 lzip_1.17-1_data.tar.xz
-rw-r--r-- 1 60240 Jul 29 16:30 lzip_1.17.orig.tar.lz
-rw-r--r-- 1 60484 Jul 29 16:30 lzip_1.17.orig.tar.xz
2) As lzip requires by default much less memory than xz to decompress
small files, it would be possible for lzip to replace both gzip and xz
in .deb packages, simplifying the format and the tools managing it. The
memory required to decompress the files above is:
( 11 kB) gddrescue_1.19-2_control.tar.gz
( 82 kB) gddrescue_1.19-2_control.tar.lz
(377 kB) gddrescue_1.19-2_data.tar.lz
(8.4 MB) gddrescue_1.19-2_data.tar.xz
( 12 kB) lzip_1.17-1_control.tar.gz
( 82 kB) lzip_1.17-1_control.tar.lz
(180 kB) lzip_1.17-1_data.tar.lz
(8.4 MB) lzip_1.17-1_data.tar.xz
(377 kB) lzip_1.17.orig.tar.lz
(8.4 MB) lzip_1.17.orig.tar.xz
3) In spite of being the official lzma successor and receiving a lot of
publicity (xz has 2.5 pages in the Wikipedia), xz is not so much more
popular than lzip. For example, about 15% of GNU projects distribute xz
tarballs vs 5% that distribute lzip tarballs. If xz were any good, that
percentage would be 100%, and I would be using it just as everybody else.
In conclusion:
From 1) and 2) it can be concluded that lzip is ideal for the use case
of Debian packaging. Because lzip requires little memory to decompress
small files it could also replace gzip in other uses like compressing
man pages, maybe allowing Debian to use just one format for all its needs.
Best regards,
Antonio.
Reply to: