[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Adding support for LZIP to dpkg, using that instead of xz, archive wide



Hi,

On Sun, 2015-06-14 at 01:08:29 +0200, Thomas Goirand wrote:
> On 06/13/2015 10:55 AM, Paul Wise wrote:
> > On Sat, Jun 13, 2015 at 4:23 PM, Thomas Goirand wrote:
> >> I've been using xz compression for a long time, but I see a big defect
> >> which is today pushing me to turn it off for the .orig.tar file. The
> >> issue is that depending on the version of xz-utils, it produces a
> >> different output.

Well if you want reproducible output, then use the same tool version.
That's the equivalent of expecting that using a different gcc version
will give you the same output.

As long as the bitstream is compatible with previous versions, I don't
see it as a problem, and I'd expect such changes to be beneficial,
because say, they might allow making the encoder faster, or compress
better, etc.

> >> We use "git archive" within the PKG OpenStack team to generate this
> >> tarball (which is more or less the same as pristine-tar, except we use
> >> upstream tags rather than a pristine-tar branch). The fact that xz
> >> produces a different result makes it not reproducible. As a
> >> consequence, it is very hard for us to use this system across
> >> distributions (ie: use that in both Debian and Ubuntu, or in Sid &
> >> Jessie). We need consistency.

If you generate it once, as part of the release process, why do you
need to generate it on different systems with different versions? And
how does that have anything to do with what gets packaged in Debian.
For Debian you only need to generate it once, why would you want to
generate it anew every time you build a new Debian revision instead
of just reusing the same tarball that is on the archive, if you don't
keep source tarball releases around?

> >> As a friend puts it:
> >> 
> >> "This is a fundamental problem/defect with xz. This (and a lot of
> >> other such defects, e.g. non-robustness of xz archives that easily
> >> lead to file corruption etc) are the reason that there is lzip (and
> >> which is why gnu.org has, on a technical basis, decided that lzip is
> >> official gzip-successor for gnu software releases when they come in
> >> tarballs).

TBH this smells like FUD. For example I've never heard of corruption in
.xz files due to non-robustness, I'd expect that corruption to come from
external forces, and that integrity would help or not detect it. In any
case .xz supports CRC32, CRC64 and SHA-256 for integrity checks, .lz only
supports CRC32. More over lzip was created to overcome limitations in the
.lzma format, .xz came later and fixed the limitations of the .lzma format
too.

(And I could probably switch dpkg-deb's .xz integrity check to CRC64,
given that's the xz-utils command-line tool default.)

Also many GNU projects do not release lzip tarballs, but do release bzip
or xz ones and there are very few that exclusively release lzip tarballs.
If that's the equivalent of bazaar being the official GNU VCS that most
of the GNU projects do not use, well…

Actually where is the gnu.org decision documented? I don't see it
neither in the GCS, the “Information for Maintainers of GNU Software”,
nor in the ftp.gnu.org site. And automake still defaults to dist-gz in
latest git.

  <http://www.gnu.org/prep/standards/>
  <http://www.gnu.org/prep/maintain/>

> >> So it'd be super nice to have LZIP support in dpkg, and use that
> >> instead of xz, archive wide.
> >> 
> >> Your thoughts everyone? Is there any reason why we wouldn't do that?

Yes, replacing xz with lzip on .deb or .dsc packages does not make any
sense. Adding lzip support for source packages *might* make some sense, as
I pointed out in the bug report. But doing so does have a very high cost:

  <https://wiki.debian.org/Teams/Dpkg/FAQ#Q:_Can_we_add_support_for_new_compressors_for_.dsc_packages.3F>

Whenever considering to add a new compressor, all surrounding tools need
to be modified to support it as well:

  <https://wiki.debian.org/Teams/Dpkg/DebSupport>
  <https://wiki.debian.org/Teams/Dpkg/DscSupport>

That's a non-zero amount of work and time, and that does not take into
account external tools and users. It would also not be usable until the
next stable release. Also notice that for example there are still tools
that do not support data.tar.xz in .deb, which has been the default for
a while, which should give you an idea of what it takes.

Adding a new compressor, that does not bring any significant benefit in
compression ratio, speed or container format, that is either not widely
used or widely available in many systems, just for the benefit of very
few packages that might be releasing as well in other formats, or that
can be easily recompressed, still does not seem worth it, no.

I've yet to see an actual convincing argument why this would be worth
the effort and trouble.

Also not to mention that I was the first to also consider .lz when we
evaluated adding .xz support in dpkg back in 2009.

  <https://lists.debian.org/debian-dpkg/2009/10/msg00029.html>

> > It was already rejected by the dpkg maintainers twice.
> > 
> > https://bugs.debian.org/600094
> > https://bugs.debian.org/556960
> 
> Reading these bugs, am I right that the archive already supports lzip
> for the orig.tar file? Because that's my issue: I don't really mind if
> we use xz for the compression of the .deb files, but I need consistency
> when generating the orig.tar.

Nothing in the .deb/.dsc tooling supports lzip AFAIK. The archive does
not even support the .lzma format.

> Now, regarding the fact that the maintainer closed the bugs, I see 2
> issues the way he did it.

First, that was a bug report from *2009/2010*. I think I was clear in
my mail that I was open to reconsider if things changed in the future.

> 1/ First, he sites the fact that lzip isn't popular enough as the only
> reason (did I miss another point of argumentation?). Well, it's
> backed-up by the GNU project as the successor of gzip, and also, I
> believe Debian is influential enough so that we may not have to care
> about it. Also, a wise technical choice of this kind shouldn't be driven
> by a popularity contest.

No, that's the summary that Antonio wrote. It's not the only reason
I gave in that mail, it's a significant one, given its implications
(see the FAQ entry above):

 * There's already .xz support (as one of the lzma variants), .lzma is
   now deprecated for .deb compression.
 * I'd rather have consistency between source and binary compressors.
 * For source packages high usage might be a more important reason to
   _accept_ lzip (given that've got an equivalent or better lzma format
   with .xz), than low usage for a _reject_ (if we didn't have .xz).

Compressor formats are subject to network-effects like many other
file formats. In this case I think .xz "won" both because it was the
"official" successor from .lzma, and because it is superior to .lz.

Depending on the context, availability and usage (or popularity if you
will), are quite important aspects when deciding when to support such
formats. In other cases, you really want to support more format, for
example on a GUI archiving program, or on something like automake.
Discounting this as a simple matter of "fashion" is not helpful.

> 2/ Guillem wrote "that's at the maintainer's discretion" (ie: to close
> the bug). Well, here, the whole of Debian is depending on this kind of
> decision, so I don't agree that this decision is only at the discretion
> of the maintainer.

That was exclusively related to whether to keep a wishlist+wontfix report
open or closed. And of course the logical next step is instead to force
the issue through the ctte… while I've only seen lzip upstream and one
other person clamoring for lzip support, and no other dicussions in
debian-devel over this, since 2010.

> Therefore, I'm tempted to raise this to the technical committee (putting
> their list as Cc). Does anyone see a reason why I am mistaking here?

*Sigh* and yes…

Regards,
Guillem


Reply to: