[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Adding support for LZIP to dpkg, using that instead of xz, archive wide



On 2015-07-26 14:10:10 +0200, Antonio Diaz Diaz wrote:
> Guillem Jover wrote:
> >TBH this smells like FUD. For example I've never heard of corruption in
> >.xz files due to non-robustness, I'd expect that corruption to come from
> >external forces, and that integrity would help or not detect it.
> 
> Sure it comes from external forces, but xz does something that no other
> compressor does: even if the corruption does not affect the data and xz is
> able to produce perfectly correct output, it will report "Compressed data is
> corrupt" and will exit with non-zero status anyway. Just take any xz file
> and append a null character to it. Bzip2, gzip and lzip simply ignore the
> extra byte.
> 
> But not only that. Xz is the only format (of the four mentioned) whose parts
> need to be aligned to a multiple of four bytes. The size of a xz file must
> also be a multiple of four bytes. To achieve this, xz includes padding
> everywhere; after headers, blocks, the index, and the whole stream. The bad
> news is that if the (useless) padding is altered in any way, "the decoder
> MUST indicate an error" according to the xz format specification.
> 
> This is specially bad when xz is used with tar, making the whole command to
> fail and the whole archive to be discarded as corrupt.
> 
> And this fragility is one of the perverse effects of the unbelievably stupid
> design of xz; "It is possible that there is a new field present which the
> decoder is not aware of, and can thus parse the Block Header
> incorrectly[1]".
> 
> [1] http://tukaani.org/xz/xz-file-format.txt (see 3.1.6. Header Padding)
> 
> So yes, the xz format is objectively more fragile than the other three.

I completely disagree. IMHO, a decompressor should be very strict and
detect any suspicious modification. In case of error, it is better to
carefully check with a second source of the compressed file.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Reply to: