Vincent Lefevre wrote:
the xz format is objectively more fragile than the other three.I completely disagree. IMHO, a decompressor should be very strict and detect any suspicious modification.
Agreed, but I'll try to make it clear how much the "strictness" of the xz format is brain damaged.
A compressed file is like an envelope with a message inside. The objective of the decompressor is to extract the message and deliver it intact to the user.
It makes sense to be strict with the integrity of the message and lax with the condition of the envelope. It is reasonable to do so, and it is what postmen do. A blot in a corner of the envelope does not compromise the integrity of the message.
For example, neither bzip2 nor lzip "protect" with a checksum the block size (bzip2), or the dictionary size (lzip). An alteration in these fields just produces a change in the amount of memory used for decompression. It can make the decompression fail, but can't alter the message in any way.
OTOH, bzip2, gzip and lzip are strict with the integrity of the message. They won't deliver a message that does not pass the test.
Contrarily to the other three formats, and against common sense, xz is strict with the integrity of the envelope but lax with the integrity of the message.
If there is just the excrement of a fly adhered to a corner of the envelope (a null byte appended to an otherwise intact file, for example), xz will report that the data is corrupt and will not deliver the message. This test is inescapable. All you can do is to send the output to stdout and hope that the cause of the problem is not some useless padding in the header, in which case you may recover nothing.
OTOH, xz provides at least three ways of ignoring the integrity of the message and happily deliver a corrupt message, exiting with zero status.
Just see the two attached files. 'good.xz' is created with the command 'xz -9 -Cnone'. The corrupt version 'bad.xz' is created by changing a couple bits in 'good.xz'.
Xz is unable to detect the corruption in 'bad.xz'. In fact xz is unable to detect ANY corruption that may happen to the payload of 'good.xz'. But if you just try to append a null byte to 'good.xz' it will report corrupt data!
Lzma-alone was simply a toy format lacking fundamental features. Xz is willfully designed to allow the maximization of the probability of losing user's data.
Best regards, Antonio.
Description: Binary data
Description: Binary data