Summary: "xz -9e" compresses a big source best. The margin is significant. This perhaps isn't news to many on the present list, but it seems worth posting anyway, if only for illustration. Data and trivia follow. On Hideki Yamane's and Henrique Holschuh's advice, to save space in Debian's archive, I have been aggressively compressing a big *.orig.tar source. Hideki and Henrique seem to be right: the choice of compression technique matters. My results: FRACT. RATIO SIZE METHOD FILE 100.0% 1.0:1 425287680 *.orig.tar 19.4% 5.2:1 82320980 gzip *.orig.tar.gz 19.2% 5.2:1 81542973 gzip -9 *.orig.tar.gz 14.5% 6.9:1 61704587 bzip2 *.orig.tar.bz2 13.9% 7.2:1 58917455 bzip2 -9 *.orig.tar.bz2 8.8% 11.4:1 37249420 xz *.orig.tar.xz 8.4% 11.9:1 35849616 xz -7 *.orig.tar.xz 8.3% 12.0:1 35473920 xz -8 *.orig.tar.xz 8.1% 12.3:1 34626532 xz -8e *.orig.tar.xz 7.9% 12.7:1 33571868 xz -9 *.orig.tar.xz 7.7% 13.0:1 32685680 xz -9e *.orig.tar.xz This 0.4-GiB *.orig.tar source happens to consist of W3 web standards documents in HTML format. It is marked-up text with some PNG and SVG graphics. (Its filename on my laptop is w3-recs_20161202.orig.tar, if you want to know; but this exact file exists only on my laptop at the moment, so don't go looking for it in the archive.) As far as compressibility goes, such an *.orig.tar might be fairly typical for Debian. According to the xz(1) man page, "xz -9e" is useful only on files larger than 32 MiB, so one does not advocate using the -9e option by default. Indeed, I am not advocating anything at all, except that the above results might interest some people. In this test, compression of a big source by "xz -9e" wins. But isn't "xz -9e" too slow? Answer: well, it was indeed slowest of the several methods tried, but still took less than five *minutes* on my laptop, compared against 10 *seconds* for plain old "gzip". Yet, even if "xz -9e" had taken five *hours* (it didn't), it would probably still have been worth doing to save the archive space. Decompression, of course, is quick, at less than three seconds (though decompression of the *.orig.tar.gz, two seconds, is admittedly even quicker). So, if that is interesting, there it is. I really don't know anything else about this, so if questions were asked, then Hideki, Henrique or others might answer. Nevertheless, the results seemed worth a post at any rate. (I am not subscribed to this list, so feel free to Cc me.) References: Hideki [1]; Henrique [2]. 1: https://lists.debian.org/debian-dpkg/2012/08/msg00027.html 2: https://lists.debian.org/debian-devel/2016/10/msg00748.html
Attachment:
signature.asc
Description: Digital signature