Re: Re: Adding support for LZIP to dpkg, using that instead of xz, archive wide
Vincent Lefevre wrote:
the xz format is objectively more fragile than the other three.
I completely disagree. IMHO, a decompressor should be very strict and
detect any suspicious modification.
(In the following response I'll assume that by "modification" you mean
"corruption" (accidental modification). No decompressor can detect
intentional modifications, for example replacing a file with another).
In a well-defined format there are no such thing as "suspicious
modifications"; a modification either violates the format or not. It is
not the responsibility of the decompressor to detect modifications out
of the defined format.
For example, bzip2, gzip and lzip include in their documentation a
paragraph similar to the following:
"Lzip will correctly decompress a file which is the concatenation of two
or more compressed files. The result is the concatenation of the
corresponding uncompressed files. Integrity testing of concatenated
compressed files is also supported."
Whatever follows a file that is not a valid header is classified as
"trailing garbage" and ignored.
Xz has broken with this tradition and has included the padding in the
definition of the format. IMHO this is a bug, just as it would be a bug
to include in the definition of octal literals the characters following
the octal digits. Something like "\12aaaa" compiles but "\12a" fails.
This bug of xz limits the way in which xz streams can be embedded in
BTW, xz is the only compressor that shows fragile behaviour with respect
to "trailing garbage". It is the only one that doesn't report the
addition of 4 null bytes to the file, but reports "Compressed data is
corrupt" without further explanation if 3 null bytes are appended.
In a well-designed format, all alterations that produce invalid output,
and only those, should be detected. Doing otherwise just prevents the
recovery of perfectly good data for no good reason. Some changes can't
be detected by any decompressor, for example a change in the amount of
padding/trailing garbage. Therefore, the only way to be sure that the
file has not been altered is to provide an external checksum.
Of course xz can't limit itself to detect alterations that produce
invalid output because its format is ill-defined. For example, it allows
the decoder to just indicate a warning if it can't verify the integrity
of the file (unsupported check). Xz is the less safe, the less friendly
and at the same time the less strict of all four, so I suppose you agree
that it should be replaced by lzip.
In case of error, it is better to carefully check with a second
source of the compressed file.
Supposing that you have a second source available.
I guess we are thinking about different use cases here: verifying a
package that can be easily downloaded again in case of corruption, vs
decompressing the only copy of an irreplaceable file.
BTW, telling a user that the only surviving copy of his important data
is corrupt just because cp screwed it up and appended some garbage data
at the end of the file is as unfriendly as it can be.
But, as stated above, in both cases the only way to be sure that the
file is intact is to provide an external checksum. No amount of
"strictness" in the decompressor can replace an external checksum.