[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Re: Adding support for LZIP to dpkg, using that instead of xz, archive wide



Vincent Lefevre wrote:
the xz format is objectively more fragile than the other three.

I completely disagree. IMHO, a decompressor should be very strict and
detect any suspicious modification.

(In the following response I'll assume that by "modification" you mean "corruption" (accidental modification). No decompressor can detect intentional modifications, for example replacing a file with another).

In a well-defined format there are no such thing as "suspicious modifications"; a modification either violates the format or not. It is not the responsibility of the decompressor to detect modifications out of the defined format.

For example, bzip2, gzip and lzip include in their documentation a paragraph similar to the following:

"Lzip will correctly decompress a file which is the concatenation of two or more compressed files. The result is the concatenation of the corresponding uncompressed files. Integrity testing of concatenated compressed files is also supported."

Whatever follows a file that is not a valid header is classified as "trailing garbage" and ignored.

Xz has broken with this tradition and has included the padding in the definition of the format. IMHO this is a bug, just as it would be a bug to include in the definition of octal literals the characters following the octal digits. Something like "\12aaaa" compiles but "\12a" fails.

This bug of xz limits the way in which xz streams can be embedded in other formats.

BTW, xz is the only compressor that shows fragile behaviour with respect to "trailing garbage". It is the only one that doesn't report the addition of 4 null bytes to the file, but reports "Compressed data is corrupt" without further explanation if 3 null bytes are appended.

In a well-designed format, all alterations that produce invalid output, and only those, should be detected. Doing otherwise just prevents the recovery of perfectly good data for no good reason. Some changes can't be detected by any decompressor, for example a change in the amount of padding/trailing garbage. Therefore, the only way to be sure that the file has not been altered is to provide an external checksum.

Of course xz can't limit itself to detect alterations that produce invalid output because its format is ill-defined. For example, it allows the decoder to just indicate a warning if it can't verify the integrity of the file (unsupported check). Xz is the less safe, the less friendly and at the same time the less strict of all four, so I suppose you agree that it should be replaced by lzip.


In case of error, it is better to carefully check with a second
source of the compressed file.

Supposing that you have a second source available.

I guess we are thinking about different use cases here: verifying a package that can be easily downloaded again in case of corruption, vs decompressing the only copy of an irreplaceable file.

BTW, telling a user that the only surviving copy of his important data is corrupt just because cp screwed it up and appended some garbage data at the end of the file is as unfriendly as it can be.

But, as stated above, in both cases the only way to be sure that the file is intact is to provide an external checksum. No amount of "strictness" in the decompressor can replace an external checksum.


Best regards,
Antonio.


Reply to: