Re: Multiarch file overlap summary and proposal

To: debian-devel@lists.debian.org
Cc: debian-dpkg@lists.debian.org
Subject: Re: Multiarch file overlap summary and proposal
From: Russ Allbery <rra@debian.org>
Date: Wed, 15 Feb 2012 16:32:38 -0800
Message-id: <[🔎] 874nur62jd.fsf@windlord.stanford.edu>
In-reply-to: <[🔎] 20120214140138.GA23158@gaara.hadrons.org> (Guillem Jover's message of "Tue, 14 Feb 2012 15:01:38 +0100")
References: <[🔎] 20120206073115.GB2033@rivendell.home.ouaza.com> <[🔎] 20120207095921.d5142d88cbb3dca679f33ec9@debian.org> <[🔎] 20120210225620.GA8782@gaara.hadrons.org> <[🔎] 20120211001446.GB2797@jwilk.net> <[🔎] 20120211005559.GA32671@burratino> <[🔎] 20120211011629.GB20155@virgil.dodds.net> <[🔎] 87zkcqrw2w.fsf@windlord.stanford.edu> <[🔎] 20120211185237.GA10129@virgil.dodds.net> <[🔎] 874nutncef.fsf_-_@windlord.stanford.edu> <[🔎] 20120214140138.GA23158@gaara.hadrons.org>

Guillem Jover <guillem@debian.org> writes:

> If packages have to be split anyway to cope with the other cases, then
> the number of new packages which might not be needed otherwise will be
> even smaller than the predicted amount, at which point it makes even
> less sense to support refcnt'ing.

I don't think the package count is really the interesting metric here,
unless the number of introduced packages is very large (see below about
-dev packages).  I'm more concerned with maintainer time and with
dependency complexity, and with the known problems that we introduce
whenever we take tightly-coupled files and separate them into independent
packages.

> It also requires maintainers to carefully consider if the (doc, etc)
> toolchains will generate predictible ouput.

Yes, I agree.  There's a tradeoff between having to think about this and
having to do the work to use arch-qualified directories.  But I think it's
worth having the tradeoff available as an option.

> This still does not solve the other issues I listed, namely binNMUs have
> to be performed in lock-step, more complicated transitions /
> upgrades. And introduces different solutions for different problems,
> while my proposal is generic for all cases.

I did review your message again (I read it when you originally posted it
as well), and I think that I addressed the cases that you pointed out in
the set of cases that I discussed in my message apart from the point about
version lockstep.

I just posted separately about version lockstep: I think this is a
feature, not a bug, in our multiarch implementation.  I think this is the
direction we *should* go, because it reduces the overall complexity of the
system.  Yes, that requires treating binNMUs as something different than a
sourceful version change, but I think that's a good idea *anyway*, even in
the absence of multiarch.  It's more semantically accurate, and it would
address other lingering problems we've had with binNMUs, or at least
reduce them.

As for the benefits of refcounting, there are three places where I think
this has substantial benefit, so let me talk through them:

1. If you look at the list of files that Steve gave in multiarch: same
   packages in Ubuntu, most of the cases that don't fall into the known
   documentation and package metadata areas are a bunch of separate
   special cases.  They don't fall easily into a handful of cases.  *But*,
   they are mostly all files in either textual or consistent binary
   formats that are installed directly from the package and are not going
   to change in a binNMU.  That means that refcounting provides a nice
   simplification: there are a bunch of random additional files of various
   different types that can all be handled the same way, without
   additional packaging complexity.  They can't all be arch-qualified in
   the same way as easily, plus arch-qualifying files that absolutely
   should not differ between architectures and where that difference would
   be a bug (such as with PAM configuration) seems wrong.

   They can also be split out into an arch: all package.  But here I think
   it's worth remembering that splitting tightly-coupled files into
   separate packages has real drawbacks and is something we should avoid
   doing if we can.  There are places where the advantages to doing so are
   overwhelming (-dev packages from shared libraries, for example), but we
   should be sure we're in that case.

   This is something that working on Lintian for a while really drove home
   for me.  People split binary packages with large data into an arch: any
   and arch: all package (often because Lintian recommends it to save
   archive space), and they do it wrong *all the time*.  They get the
   dependencies wrong, or don't think about what files belong in which
   package, or accidentally put an arch-specific file in the data package.
   I have a package that does this sort of split (gnubg), and from
   personal experience know that it's not an easy thing to maintain.
   We're not saving packaging complexity by asking people to do this
   instead of refcounting, IMO.

   Also, there are other drawbacks of splitting closely coupled files into
   separate packages even apart from packaging complexity.  For example,
   it's common for people to move the man pages and desktop files for
   binaries into that arch: all data package too, since hey they're
   arch-independent and in /usr/share and that's easy.  But this is
   usually the wrong thing to do.  Now you have created the possibility of
   having desktop files installed for binaries that don't exist, you've
   made it much harder for tools like Lintian to check that your man pages
   and desktop files are consistent with the binaries, and you have to be
   very careful about versioning dependencies.  You also create a separate
   package that's artificial from the user's perspective, may not get
   removed when the main package is removed, and shows up in apt-cache
   search and similar interfaces despite the fact that the user doesn't
   care about it at all.

   I really don't like package splitting as our answer to everything.  At
   the least, it definitely isn't an obviously clean and simple solution.

2. Include files are, by quantity, a pretty substantial percentage of the
   files we're talking about here once we start making our -dev packages
   multiarch aware.  Most include files these days aren't arch-specific,
   since facilities like int32_t have removed a lot of the need to put
   architecture-specific information in headers.  Refcounting handles them
   quite cleanly.  They don't change during binNMUs, and they aren't
   compressed.

   I think splitting all -dev packages into -dev and -include is a
   non-starter.  This is a bunch of extra packaging work, it adds a bunch
   of noise to our package lists that interferes with user searches, it
   really does create a substantial number of additional packages (one for
   every -dev package in the archive to a first approximation if we
   multiarch everything), and it creates new artificial user support
   problems, such as users who install the -include package and not the
   -dev package and then don't understand why they can't link with the
   library.

   So, what we're really talking about here, in the absence of
   refcounting, is moving the entirety of /usr/include into
   /usr/include/<triplet>.  Now, this *does* work technically; compilers
   will find the headers, and the behavior will be what we want.  But
   again this is ongoing packaging complexity to install the headers in a
   place other than where the upstream build system is going to put them.
   It's also surprising.  We already broke a few packages just by moving
   the arch-specific include files of some packages there, which is
   unavoidable.  I had to do more ugly things to the ugly openafs build
   system to cope, for example.  And we just had a really uncomfortable
   conversation on the GCC mailing list (that amounted to "what are those
   crazy Debian people doing?") because these moves broke building GCC
   from source because header files are no longer where they are
   everywhere else.

   Doing this for arch-specific headers is already problematic but not
   really avoidable if we're going to move forward.  But that's a smaller
   change than doing it for nearly all headers, and I think we're really
   going to surprise our users in some unpleasant ways if we do this.
   Yes, software should not assume it knows the system header search
   paths, but the fact remains that software *does*, and fixing all of it
   is going to be painful and not produce a lot of good will with
   upstreams.

3. Package metadata in /usr/share/doc/<package>.  Yes, we can avoid
   refcounting here by using <package>:<arch> instead, and I understand
   the appeal of simplicitly there.  But the more I think about this, the
   more I think this is not the right model for how our packages should
   look and not the idea that we should be exposing to users.

   I don't think we should be encouraging the idea that <package>:i386 and
   <package>:amd64 are two completely distinct installed packages that
   have no relationship to each other.  Rather, I think we should
   encourage users to think of a single installed package named <package>,
   which can be installed for one or more architectures.  Creating
   separate package metadata directories is the right thing to do if we
   expect the i386 and amd64 installed packages to be different, have
   different copyrights, have different NEWS.Debian files, different
   READMEs, different changelogs (binNMU aside, as mentioned above), and
   so forth.  But this is going down exactly that complexity path that
   Joey is talking about, IMO.

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>

Reply to:

Follow-Ups:
- Re: Multiarch file overlap summary and proposal
  - From: Russ Allbery <rra@debian.org>

References:
- Please test dpkg with multiarch support
  - From: Raphael Hertzog <hertzog@debian.org>
- Re: Please test gzip -9n - related to dpkg with multiarch support
  - From: Neil Williams <codehelp@debian.org>
- Summary: dpkg shared / reference counted files and version match
  - From: Guillem Jover <guillem@debian.org>
- Re: Summary: dpkg shared / reference counted files and version match
  - From: Jakub Wilk <jwilk@debian.org>
- Re: Summary: dpkg shared / reference counted files and version match
  - From: Jonathan Nieder <jrnieder@gmail.com>
- Re: Summary: dpkg shared / reference counted files and version match
  - From: Steve Langasek <vorlon@debian.org>
- Re: Summary: dpkg shared / reference counted files and version match
  - From: Russ Allbery <rra@debian.org>
- Re: Summary: dpkg shared / reference counted files and version match
  - From: Steve Langasek <vorlon@debian.org>
- Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)
  - From: Russ Allbery <rra@debian.org>
- Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)
  - From: Guillem Jover <guillem@debian.org>

Prev by Date: Re: DEP5: minor suggestions - FSF address etc.
Next by Date: Re: Source package names for R libraries (and Perl, Python, Java, …).
Previous by thread: Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)
Next by thread: Re: Multiarch file overlap summary and proposal
Index(es):
- Date
- Thread