Re: Multiarch file overlap summary and proposal
Guillem Jover <guillem@debian.org> writes:
> If packages have to be split anyway to cope with the other cases, then
> the number of new packages which might not be needed otherwise will be
> even smaller than the predicted amount, at which point it makes even
> less sense to support refcnt'ing.
I don't think the package count is really the interesting metric here,
unless the number of introduced packages is very large (see below about
-dev packages). I'm more concerned with maintainer time and with
dependency complexity, and with the known problems that we introduce
whenever we take tightly-coupled files and separate them into independent
packages.
> It also requires maintainers to carefully consider if the (doc, etc)
> toolchains will generate predictible ouput.
Yes, I agree. There's a tradeoff between having to think about this and
having to do the work to use arch-qualified directories. But I think it's
worth having the tradeoff available as an option.
> This still does not solve the other issues I listed, namely binNMUs have
> to be performed in lock-step, more complicated transitions /
> upgrades. And introduces different solutions for different problems,
> while my proposal is generic for all cases.
I did review your message again (I read it when you originally posted it
as well), and I think that I addressed the cases that you pointed out in
the set of cases that I discussed in my message apart from the point about
version lockstep.
I just posted separately about version lockstep: I think this is a
feature, not a bug, in our multiarch implementation. I think this is the
direction we *should* go, because it reduces the overall complexity of the
system. Yes, that requires treating binNMUs as something different than a
sourceful version change, but I think that's a good idea *anyway*, even in
the absence of multiarch. It's more semantically accurate, and it would
address other lingering problems we've had with binNMUs, or at least
reduce them.
As for the benefits of refcounting, there are three places where I think
this has substantial benefit, so let me talk through them:
1. If you look at the list of files that Steve gave in multiarch: same
packages in Ubuntu, most of the cases that don't fall into the known
documentation and package metadata areas are a bunch of separate
special cases. They don't fall easily into a handful of cases. *But*,
they are mostly all files in either textual or consistent binary
formats that are installed directly from the package and are not going
to change in a binNMU. That means that refcounting provides a nice
simplification: there are a bunch of random additional files of various
different types that can all be handled the same way, without
additional packaging complexity. They can't all be arch-qualified in
the same way as easily, plus arch-qualifying files that absolutely
should not differ between architectures and where that difference would
be a bug (such as with PAM configuration) seems wrong.
They can also be split out into an arch: all package. But here I think
it's worth remembering that splitting tightly-coupled files into
separate packages has real drawbacks and is something we should avoid
doing if we can. There are places where the advantages to doing so are
overwhelming (-dev packages from shared libraries, for example), but we
should be sure we're in that case.
This is something that working on Lintian for a while really drove home
for me. People split binary packages with large data into an arch: any
and arch: all package (often because Lintian recommends it to save
archive space), and they do it wrong *all the time*. They get the
dependencies wrong, or don't think about what files belong in which
package, or accidentally put an arch-specific file in the data package.
I have a package that does this sort of split (gnubg), and from
personal experience know that it's not an easy thing to maintain.
We're not saving packaging complexity by asking people to do this
instead of refcounting, IMO.
Also, there are other drawbacks of splitting closely coupled files into
separate packages even apart from packaging complexity. For example,
it's common for people to move the man pages and desktop files for
binaries into that arch: all data package too, since hey they're
arch-independent and in /usr/share and that's easy. But this is
usually the wrong thing to do. Now you have created the possibility of
having desktop files installed for binaries that don't exist, you've
made it much harder for tools like Lintian to check that your man pages
and desktop files are consistent with the binaries, and you have to be
very careful about versioning dependencies. You also create a separate
package that's artificial from the user's perspective, may not get
removed when the main package is removed, and shows up in apt-cache
search and similar interfaces despite the fact that the user doesn't
care about it at all.
I really don't like package splitting as our answer to everything. At
the least, it definitely isn't an obviously clean and simple solution.
2. Include files are, by quantity, a pretty substantial percentage of the
files we're talking about here once we start making our -dev packages
multiarch aware. Most include files these days aren't arch-specific,
since facilities like int32_t have removed a lot of the need to put
architecture-specific information in headers. Refcounting handles them
quite cleanly. They don't change during binNMUs, and they aren't
compressed.
I think splitting all -dev packages into -dev and -include is a
non-starter. This is a bunch of extra packaging work, it adds a bunch
of noise to our package lists that interferes with user searches, it
really does create a substantial number of additional packages (one for
every -dev package in the archive to a first approximation if we
multiarch everything), and it creates new artificial user support
problems, such as users who install the -include package and not the
-dev package and then don't understand why they can't link with the
library.
So, what we're really talking about here, in the absence of
refcounting, is moving the entirety of /usr/include into
/usr/include/<triplet>. Now, this *does* work technically; compilers
will find the headers, and the behavior will be what we want. But
again this is ongoing packaging complexity to install the headers in a
place other than where the upstream build system is going to put them.
It's also surprising. We already broke a few packages just by moving
the arch-specific include files of some packages there, which is
unavoidable. I had to do more ugly things to the ugly openafs build
system to cope, for example. And we just had a really uncomfortable
conversation on the GCC mailing list (that amounted to "what are those
crazy Debian people doing?") because these moves broke building GCC
from source because header files are no longer where they are
everywhere else.
Doing this for arch-specific headers is already problematic but not
really avoidable if we're going to move forward. But that's a smaller
change than doing it for nearly all headers, and I think we're really
going to surprise our users in some unpleasant ways if we do this.
Yes, software should not assume it knows the system header search
paths, but the fact remains that software *does*, and fixing all of it
is going to be painful and not produce a lot of good will with
upstreams.
3. Package metadata in /usr/share/doc/<package>. Yes, we can avoid
refcounting here by using <package>:<arch> instead, and I understand
the appeal of simplicitly there. But the more I think about this, the
more I think this is not the right model for how our packages should
look and not the idea that we should be exposing to users.
I don't think we should be encouraging the idea that <package>:i386 and
<package>:amd64 are two completely distinct installed packages that
have no relationship to each other. Rather, I think we should
encourage users to think of a single installed package named <package>,
which can be installed for one or more architectures. Creating
separate package metadata directories is the right thing to do if we
expect the i386 and amd64 installed packages to be different, have
different copyrights, have different NEWS.Debian files, different
READMEs, different changelogs (binNMU aside, as mentioned above), and
so forth. But this is going down exactly that complexity path that
Joey is talking about, IMO.
--
Russ Allbery (rra@debian.org) <http://www.eyrie.org/~eagle/>
Reply to: