[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Please test gzip -9n - related to dpkg with multiarch support

On Mon, 6 Feb 2012 08:31:15 +0100
Raphael Hertzog <hertzog@debian.org> wrote:

> If you discover any bug in dpkg's multiarch implementation, please
> report it to the BTS (against the version 1.16.2~wipmultiarch).

I'd like to ask for some help with a bug which is tripping up my tests
with the multiarch-aware dpkg from experimental - #647522 -
non-deterministic behaviour of gzip -9n.

Some MultiArch: same packages in the archive (libppl9 is the one I came
across first) contain .gz files in ./usr/share/doc/ which differ between
architectures when, AFAICT, the original/decompressed file does not.
i.e. this isn't a bug in libppl9. Strangely, unpacking the .deb,
decompressing these files and then recompressing them with gzip -9nf
changes the checksum of the .gz file *to match the other architectures*.

e.g. the armel package has a bad .gz file, the armhf has a good one.
the kfreebsd-amd64 package has a bad .gz file, the amd64 has a good one.

If that matrix was flipped diagonally, it might make more sense.

The bad checksums also *match* between armel and kfreebsd-amd64.

armel, kfreebsd-amd64:
0e52e84eebf41588865742edaff7b3c0  usr/share/doc/libppl9/CREDITS.gz

armhf, i386, amd64:
99e2b9f8972ce00cfe57e3735881015e  usr/share/doc/libppl9/CREDITS.gz

By bad, I mean that the .gz file, when decompressed and recompressed,
changes checksum to match the other architecture. It appears to be a
boolean change, not random or Nary.

In this case, it also changes the filesize:

armel, kfreebsd-amd64:
6344 2011-02-27 09:07 ./usr/share/doc/libppl9/CREDITS.gz

armhf, i386, amd64:
6343 2011-02-27 09:07 ./usr/share/doc/libppl9/CREDITS.gz

(Jakub Wilk originally spotted a checksum change without a filesize
change, so filesize is not the best indicator, hence the checksum test.)

Decompress and recompress the file from the kfreebsd-amd64 or armel
packages on amd64 or armel and the filesize changes back to 6343 and the
checksum changes to that of amd64/armhf/i386 etc. making the bug very
hard to reproduce.

The change does not happen in reverse, neither can I regenerate the .gz
file with the original checksum on the architecture which showed the
original problem. Once the bad checksum changes to the good one,
repeating the compression retains the good checksum. (The .gz file
with the changed checksum really is different - it is one byte larger
and 3 bytes differ.) I've run the test script for a couple of hundred
iterations and the checksum always changes after the first 
decompress+compress cycle but never changes back.

So far, I've tried this on abel.debian.org, inside and outside the sid
chroot, and on amd64. Either the armel or kfreebsd-amd64 package can be
unpacked and the CREDITS.gz file decompressed and recompressed - the
filesize and checksum change to the values seen on armhf and amd64.

Can someone spot whether I've made a mess of the test script or whether
there is something else going on here?



It would be a very laborious task to check the md5sums of every .gz file
in /usr/share/doc in every MultiArch: same package across all
architectures and the Contents-* files on the mirrors don't contain the
filesize of the listed files. Does anyone have ideas on how to scan the
archive for this kind of problem?

If we can't pin this down, it is going to make MultiArch very hard to
deliver - any package build could make some MultiArch combinations
uninstallable in ways that are very hard to detect in advance, causing
entire dependency chains to fail to install.

The manifestation of the issue in libppl9 is clear when trying to
install the MultiArch build-dependencies for cross-compilers:

$ sudo apt-get install libcloog-ppl-dev:armel

Selecting previously unselected package libppl9:armel.
(Reading database ... 167711 files and directories currently installed.)
Unpacking libppl9:armel (from .../libppl9_0.11.2-6_armel.deb) ...
dpkg: error
processing /var/cache/apt/archives/libppl9_0.11.2-6_armel.deb
(--unpack): './usr/share/doc/libppl9/CREDITS.gz' is different from the
same file on the system

This then leaves the installation in a broken state and needs careful
manual intervention to remove the dependencies of the broken package as
`apt-get -f install` wrongly tries to just reinstall the libppl9:armel
package again.

dpkg is correct in it's current handling - the files really are
different. The problem is that the uncompressed file is not.

Comment from Paul Effert:
> I should add that it's OK (from the point of view of
> the RFCs) if gzip produces different outputs given the same
> inputs when compressing.  The RFCs allow that and presumably
> other gzip implementations do that.  All that's required is
> that compress+decompress result in a copy of the original.

What we're seeing here are differences after decompress+compress but
without a reproducible test for this bug, dpkg might have to implement
a workaround.

I'm wondering if this means that dpkg will have to try and decompress
the .gz files in /usr/share/doc to verify if the *contents* are the
same before failing to install if the .gz itself differs.

With so few packages currently converted to MultiArch: same, it's
worrying that the first package I tried hit this bug.


Neil Williams

Attachment: pgpoW4bNtI6I2.pgp
Description: PGP signature

Reply to: