[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

RFC on doc-multiarch-spec


guillem recently asked for comments on doc/spec/multiarch.txt in the
pu/doc-multiarch-spec branch in the dpkg git. I'm copying its full contents
here and CC the lists I think have all the multiarch experts in it so that it
becomes easy for others to comment as well.

Thank you guillem for writing all of this up!

> Multiarch Specification
> =======================
> Status: implemented, stable
> This specification is considered to be the canonical reference for multiarch,
> but in case of discrepancies between this and the current implementation in
> dpkg, the latter should be considered the expected behavior, unless it can
> be argued that it is suboptimal and it can be easily changed.
> Those discrepancies might come about because this document was rewritten
> from scratch after the fact.

This document is about multiarch in dpkg. For multiarch in Debian, the
canonical reference should be the (not yet existing) Debian policy write-up on
multiarch. Should this document mention vendor specific policy like Debian

> [
>   TODO: Check whether anything is still missing and worth adding from:
>   - https://wiki.debian.org/Multiarch/Tuples
>   - https://wiki.debian.org/Multiarch/MissingRationale
>   - https://wiki.debian.org/Teams/Dpkg/TimeTravelFixes
> ]
> Background
> ----------
> Make it possible to install packages for different architectures, with
> support from the package manager. Make it possible to cross-build packages
> for different architectures easily.

The second is a consequence from the first. By being able to install packages
from different architectures, we make resolving cross build dependencies much
easier. Maybe it should be formulated as such like:

Make it possible to install packages for different architectures, with support
from the package manager. This allows, among other things:

 - support running 32 bit applications on 64 bit platforms that support this by
   installing 32 bit shared libraries
 - support cross build dependency resolution installing build architecture and
   host architecture version of packages as required
 - use completely foreign architecture binaries through qemu-user
 - cross-grading a system from one architecture to another

> There has been at least three previous ways to handle these needs. All of
> which were rather unsatisfactory:
>  * Installing foreign packages with «dpkg --force-architecture».
>    This made it possible to install foreign packages, but was of very limited
>    use, as the dependency relationships related to the architecture was
>    nonexistent, and did not allow to express most of the most complex
>    relationships.

I'd replace the last "most" with "more".

>  * Using the multilib layout.
>    This is the layout supported by many other distributions, to make it
>    possible to install packages for the alternative runnable ABI for a
>    specific architecture. But it has the fatal problem of not being a
>    generalized approach, having inconsistent and confusing semantics for
>    the multilib directories and requiring to hardcode the set of alternative
>    ABIs supported for each main architecture. It installs into paths such
>    as /usr/lib, /usr/lib32, /usr/lib64, where /usr/lib might or might not be
>    the native architecture.
>  * Using the sysroot layout.
>    This is a more general solution than multilib, but it requires a
>    pseudo-chroot equivalent for each architecture. It also pollutes the
>    filesystem namespace as it installs into paths such as /usr/<sysroot>.
> To be able to install packages from another architecture, we need to make
> it possible for the package managers to tell what is and what is not allowed,
> so that the dependency system does not get broken.
> One recurring theme in the design of this specification was to allow for
> incremental adoption (no flag days required), and to not break previous
> satisfiability assumptions. New dependency types should be allowed, but
> dependencies that were previously allowed should not stop working.
> This would require changes in packaging, both in the filesystem layout
> to make co-installability possible, and in the metadata to annotate the
> packages and their dependencies depending on the interfaces provided.
> Architecture type concepts
> --------------------------
> There are several important architecture types to take into consideration
> with multiarch. We have the following different types:
>  * <native>: Is the one the package manager (dpkg) has been built for, this
>    architecture can change by way of cross-grading dpkg itself.
>    «dpkg --print-architecture»

In the context of #1020533 we were discussing whether it makes sense whether
dpkg should really always its own architecture being the native architecture,
so this might change in the future.

>  * <foreign>: This is a non-<native> architecture.
>    «dpkg --print-foreign-architectures»
>  * package architecture: The architecture of a package, which can be entirely
>    different to the <native> architecture. From within maintainer scripts
>    it can be fetched from the DPKG_MAINTSCRIPT_ARCH environment variable,
>    and otherwise with «dpkg-deb -f <pkg>.deb Architecture».

Is "all" a package architecture or is the package architecture of a arch:all
package implicitly the native architecture under this definition?

>  * dependency architecture: The architecture of the package in a dependency.
>    Described in § "Dependency architecture inference".
>  * <build-arch>: The architecture the package is built on, which should
>    match <native>. Relevant when building packages.
>    «dpkg-architecture -qDEB_BUILD_ARCH»

s/package/source package/

I think either always be implicit and call binary packages "packages" and
source packages with the "source" prefix or always be explicit and prefix the
term "package" with "binary" or "source" as appropriate.

>  * <host-arch>: The architecture the packages is built for, determined
>    explicitly from user input, or from the architecture the compiler
>    generates code for. Relevant when building packages.
>    «dpkg-architecture -qDEB_HOST_HOST»

Same as above.

>  * <target-arch>: The architecture the compiler being built will build for,
>    determined explicitly from user input, or otherwise <host-arch>.
>    «dpkg-architecture -qDEB_TARGET_HOST»

We recently noted, that the term "target arch" might not only be useful for
compilers but also for other software that outputs or interprets things
specific to an architecture like emulators or virtual machines. But this just
as a side-note.

> Multiarch Tuples
> ----------------
> The multiarch tuples are architecture strings that describe each different
> architecture ABI. These are based on the GNU tuples, except that they get
> normalized to their base form, ignoring any ISA specialization.
> These are used as part of the filesystem layout to be able to co-install
> packages that would otherwise have conflicting pathnames with different
> contents.
> ### Rationale
> These tuples were introduced to get constant values, which was not the case
> at least for the i386 dpkg architecture where the CPU part of the GNU tuple
> has been getting bumped when the baseline ISA has been bumped.
> ### Examples
> This value can be fetched with «dpkg-architecture -qDEB_<type>_MULTIARCH».
> Filesystem Layout
> -----------------
> The multiarch design is based on the concept that some kind of packages can
> be co-installed. But these same packages would contain architecture-dependent
> content that was previously exposed on the same pathname across architectures.
> These architecture-dependent pathnames get relocated, as part of the
> packaging, into multiarch tuple qualified pathnames. So if a shared library
> used to be located at «<libdir>/libfoo.so.10», it would now be located at
> «<libdir>/<multiarch-tuple>/libfoo.so.10».
> For pathnames that provide the same content independently of the architecture
> used to build and use them, the same pathname can still be used, as the
> package manager will refcount them, as long as their digests match.

I think I know what you mean by "as long as their digests match" but maybe it
is more clear to say "as long as they are identical"? Maybe in the end, it is
indeed only the digest that needs to match but for practical purposes we want
the contents to match. So the fact that the implementation chooses (I guess?)
to compare digests isn't important here and the intention that the contents
should be identical should be documented instead.

> ### Rationale
>  * Allows to install multiple architectures.
>  * It is a uniform namespace.
>  * It is not limited to sibling or related architectures only diverging
>    in bitness or ABI like multilib does.

Does it make sense to note in this section, that this co-installability is only
intended for shared libraries in /usr/lib but not for executables in /usr/bin?

> Package Interfaces
> ------------------
> A key concept in multiarch is the interfaces a package provides. This limits
> how a package can be used by other packages, and when it can be installed.
> There is an important distinction here between the interface being architecture
> independent, and the interface being runnable from some architecture.
> Runnability is of not great concern when it comes to the metadata annotations
> in packages and dependencies. It is mainly of concern for the users and
> frontends installing packages. Runnability is also a property that is not
> made available simply by the current hardware architecture, using an emulator
> can make an interface runnable.
> When talking about interfaces, that refers to both passive (mostly files
> and their pathnames) and active ones (shared libraries, programs, etc.).

Generally, I would avoid the use of "etc". Readers that do not know how to
continue a list that is abbreviated with "etc" do not gain anything by it.
Readers who do know how to continue the list do not either.

> For passive ones, the pathnames should not be arch-qualified, because then
> locating them requires arch-specific knowledge. File formats should either
> be arch-independent, or should make it possible to describe within all
> possible different encodings, such as endianness, bitness, etc. But the
> generation should select a single set of encoding and always generate the
> same output.

What is "the generation" here?

> Within active ones, there are two main sub-types, runnable and linkable.

If there are only two types, what does the "etc" above stand for?

> The common examples for these are programs (binaries or scripts) that one
> runs, and shared libraries or architecture-specific modules or plugins
> that one loads and links against. Runnable interfaces might be either
> arch-dependent or independent depending on whether their output varies
> per-architecture. It does not matter whether those runnable interfaces
> are implemented in apparently arch-independent scripting languages for
> example, as those can still be arch-dependent. Linkable interfaces are
> always arch-dependent, as they are required to match the ABIs.

I would expand more here on what the interface of a program actually is. I
think it's clear that the interface of a shared library is architecture
dependent but for the interface of a program, it is a common problem and a
common question whether the program can be marked multi-arch:foreign or not. My
favourite example here is "make". 99% of the Makefiles out there probably use
make in a way that would allow make being m-a:foreign. But the following
snippet shows a Makefile that acts differently depending on the native

    all: -lc
            @echo $(<)

Additionally, make is able to load shared libraries at runtime. I think the
multiarch spec should expand on what an interface is a bit better and explain
that to some extend, it is up to the maintainer what they deem the interface of
a program. If the architecture-dependent parts are never used or not supposed
to be used, it might as well be okay to mark something multi-arch:foreign.

This reminds me of another important question that pops up all the time which I
think that this doc should explain somewhere:

Why would it be wrong to mark all arch:all packages as m-a:foreign?

The current version of this doc does not explain that arch:all packages are
implicitly the native architecture. The text above implies that the "runnable
program" can be arch:all and do arch-dependent stuff but i think this should be
made more explicit as I found this to be a very common point of confusion.
Essentially, what I'd like to be spelled out explicitly somewhere is:

 1. arch:all packages are implicitly of the native architecture
 2. arch:all packages can ship scripts that are able to do architecture
    dependent stuff, thus creating an architecture dependent interface
 3. arch:all packages can depend on another package that makes it impossible
    to declare it m-a:foreign
 4. the above is the reason why arch:all packages cannot be assumed to be
    implicitly m-a:foreign when satisfying cross-build dependencies

> Control fields
> --------------
> ### The Multi-Arch field
> This field will allow to satisfy dependencies between packages of
> different architectures (beyond Architecture: all), and co-install
> a package with the same name but different architecture.
> The permitted values are:
>   * “no”
>     This value is equivalent to the current default, that being the omission
>     of the field.
>     The interfaces provided by this package are unknown. This means the
>     package has either not been yet made multiarch aware, or in some rare
>     situations when none of the other values currently fit, and has been
>     marked explicitly as having been evaluated.

Why do you write that it is rare that none of the other values fit? I think
most architecture dependent programs fit none of the other values.

>   * “same“
>     This package is co-installable with itself (other architecture instances),
>     but it must not be used to satisfy the dependency of any package of a
>     different architecture from itself.
>     The main purpose of this value is to mark packages that provide
>     architecture-dependent linkable interfaces. In special circumstances it
>     can also be used to provide runnable interfaces where each program or
>     script filename is arch-qualified.
>   * “foreign”
>     The package is not co-installable with itself, but should be allowed
>     to satisfy the dependencies of a package of a different architecture
>     from itself.
>     The main purpose of this value is to mark packages that provide
>     architecture-independent interfaces, such as data files, programs
>     with architecture-independent behavior (even if the program is compiled
>     and architecture-specific), scripting language modules, etc.

I think adding "scripting language modules" here is a bit dangerous because of
the m-a interpreter problem.

>   * “allowed”
>     This permits the reverse-dependencies of the package to annotate their
>     dependency field to indicate that a foreign architecture version of the
>     package satisfies the dependencies, but does not change the resolution
>     of any existing dependencies.
>     The main purpose of this value is to mark packages that have a dual
>     role, either as runnable (architecture-independent) or linkable
>     (architecture-dependent) depending on how the depending package uses
>     those interfaces. As that knowledge lies in the depending package,
>     the responsibility to denote that type of interface usage falls on
>     those dependencies, through arch-qualifiers. This value enables those
>     «:any» arch-qualifiers to be taken into account, as to not let such
>     wildcards be declared without cooperation and agreement from the package
>     providing those interfaces.

Another important purpose of "allowed" is for packages providing a runnable
program that can be either used in an architecture dependent or independent

> Dependency resolution
> ---------------------
> Dependency resolution has two main parts, run-time and build-time.
> Packages in dependencies can be annotated with arch-qualifiers. These
> are suffixed to the package name after a colon (':'), and consist of
> one of several special strings such as 'any', 'native', or an actual
> architecture name. These arch-qualifiers will restrict which packages
> can satisfy these dependencies.
> Because Essential:yes is not intended for shared library packages, it is
> assumed that any implicit dependency on an essential package is satisfied
> by the binaries from the native architecture.
> ### Dependency architecture inference
> Dependencies always contain architecture information, be it implicit or
> explicit with arch-qualifiers. This information is used in various places
> as part of the dependency satisfiability checks. The following table
> describes how the dependency architectures from a package get determined
> given the package architecture.
>       \  Pkg arch |
>   Dep  \          | all           <pkg-arch>
>   ----------------+----------------------------
>   pkg¹            | <native>/any  <pkg-arch>/any
>   pkg:<dep-arch>  | <dep-arch>    <dep-arch>
>   pkg:any         | any           any

I do not understand the /any in the pkg¹ row. What does it mean?

> [¹]
>   * For Pre-Depends/Depends/Recommends/Suggests/Enhances/Provides, the
>     implicit arch-qualifier is <native> for arch 'all' packages, or <pkg-arch>.

..or <pkg-arch> for arch 'any' packages.

>   * For Conflicts/Breaks/Replaces, the implicit arch-qualifier is 'any'.
>   * [ TODO: Document build-time dependency fields. ]
> ### Run-time satisfiability
> The first is the usual run-time dependency resolution when installing
> packages on the system for their normal use, while using Pre-Depends,
> Depends, Conflicts, Breaks, Replaces, Provides. This also applies to
> Recommends, Suggests and Enhances, but as those are not strict
> requirements, their semantics depend on how the frontend honors the
> fields.
> This type of dependency is concerned with the architecture of the package
> being installed, and the architectures of its dependencies.
>       \  M-A |
>   Dep  \     | no          same        foreign     allowed
>   -----------+-----------------------------------------------
>   pkg        | <dep-arch>  <dep-arch>  any         <dep-arch>
>   pkg:<arch> | <dep-arch>  <dep-arch>  <dep-arch>  <dep-arch>
>   pkg:any    | <dep-arch>  <dep-arch>  <dep-arch>  any

Why is a pkg:<arch> dependency on a m-a:foreign package only satisfied by
<dep-arch>? The m-a:foreign package (as described above) "satisfies the
dependencies of a package of a different architecture from itself." If it does
that, then it doesn't make sense that, then why does foo:i386 not satisfy a
dependency on foo:amd64? If foo:i386 cannot satisfy that dependency (and that's
why the other package explicitly stated foo:amd64) then it shouldn't be

> The pkg:any dependency only being satisfied with M-A:allowed was added in
> part so that packages could not start declaring wildcard relationships
> without cooperation and agreement from the packages providing such
> interfaces, because the semantics of these interfaces might not be clear to
> external parties.
> [ TODO: Document that pkg:any is only satisfied for non M-A:allowed with
>   Conflicts/Breaks/Replaces fields. ]

There should probably be two tables then? It also confused me that the pkg:any
row has these <dep-arch> entries instead of saying "disallowed".

> ### Build-time satisfiability
> The other applies

The other what?

> while satisfying build-time dependencies while using the
> fields Build-Depends, Build-Conflicts, Build-Depends-Arch,
> Build-Conflicts-Arch, Build-Depends-Indep, Build-Conflicts-Indep. These are
> concerned with source packages, so we do not have any architecture information
> from that.
> In this mode of satisfiability, a new concept to take into account is the
> distinction between build, host and target architectures, which are the only
> architectures we will have knowledge of.

This concept is not really new as it was mentioned above.

>       \  M-A |
>   Dep  \     | no            same          foreign             allowed
>   -----------+----------------------------------------------------------
>   pkg        | <host-arch>   <host-arch>   any (<build-arch>)  <host-arch>
>   pkg:<arch> | <host-arch>   <host-arch>   any (<build-arch>)  <host-arch>
>   pkg:any    | disallowed    disallowed    disallowed          any (<build-arch>)
>   pkg:native | <build-arch>  <build-arch>  disallowed          <build-arch>
>   pkg:target | N/A ...
> With «any (<type-arch>)» meaning that while any architecture would do, the
> preferred one is <type-arch>.
> The build-time satisfiability includes disallowed relationships because
> these help detect nonsensical relationships. This difference compared
> with the run-time behavior is because it tends to be easier to modify
> the source once you have it around.
> The pkg:any with anything that is not M-A:allowed relationship is disallowed
> because the requested relationship is not getting respected.
> The pkg:native with M-A:foreign relationship is disallowed because that
> indicates either (or both) markings is in error. Either the interface is
> arch-dependent and thus can be requested to be pkg:native, or it is
> arch-independent and the target can be provided as foreign.

That's the same argument for pkg:native to m-a:foreign as i made above for
pkg:any to m-a:foreign.

> [ TODO: Document discrepancies and their rationale for difference in
>   satisfiability for pkg:any, and for not honoring the distinction between
>   Build-Depends and Build-Conflicts like with run-time deps. ]
> Reference counted files
> -----------------------
> File reference counting is an operation that dpkg performs for
> Multi-Arch:same packages, so that files that would otherwise conflict,
> can be shared between different architecture instances and do not need
> to be split into common packages.
> A ref-counted file is one that is owned by multiple arch-instances of
> a Multi-Arch:same package. The current requirements are:
>  * Multi-Arch:same packages can only be configured if all of their instances
>    are unpacked at their exact same binary version.
>  * All ref-counted files need to match on their md5sums.
> Maintainer scripts can fetch the package ref-counter from the environment
> ### Rationale
> * Requires less package splits, and thus less package metadata and less
>   maintainer work.
> * Can avoid disk duplication, as the contents for the same package files
>   get shared between different instances.
> ### Problems
> Even though file ref-counting has some nice properties to avoid work for
> maintainers, it is really broken by design as it has also some very bad
> properties.
> Some are even in principle unfixable. In addition backpedaling on that
> decision would imply quite some work now. Given the requirements above:
>  * It cannot guarantee that the generated files will be bit identical if
>    they have not been generated with the same build-dependencies as the
>    other instances.
>  * It also introduced the requirement that packages need to be installed
>    in version lock-step, which complicates upgrades, and makes packages
>    uninstallable when one of the instances is not yet available.
>  * Makes the maintainer script semantics more complicated.
>  * Unmatched binNMUs make packages not co-installable, due to version-skew.
>  * binNMUs in general are by default not co-installable, due to differing
>    changelog entries.
>  * Only the last package instance can check that it matches the md5sums
>    of the already installed ref-counted files, which means differing files
>    might not get detected.
>  * Essential packages, which must work even when only unpacked, might not
>    work at all if one of its Pre-Depends is a M-A:same shared library that
>    has an unpacked shared file from another instance from a different binary
>    version.
> The currently implemented and proposed workaround to some of this problems
> has been a series of ad-hoc hacks:
>  * Split the binNMU changelog entry into a different file, automatically
>    only for packages using debhelper.
>  * Hunt down all packages that contain differences depending on the
>    architecture, and try to make them reproducible, but this might just
>    shadow files that might end up changing depending on the program
>    generating them.
>  * (postponed) Switch the binary version coherence check for all instances
>    to be source version based. This mixes up the source and binary
>    versionspaces, and makes it akin to a magic check.
> Ideally:
>  * To avoid a flag day we could add a new Multi-Arch field value, with
>    similar semantics as «same» but implying no ref-counting.
>  * Split ref-counted files into their own common packages.
>  * Move at least changelog files into the .deb control area, and consequently
>    to the dpkg db.
>    - This would also allow to transparently compress and deduplicate those
>      files, w/o needing to do flaky directory to symlink dances back and forth.
>  * At some point in the future, when not needed at all, disable ref-counting
>    completely, or via a --force flag? (Breaks compatibility and might not be
>    possible at all, ever.)

Is it really helpful to have this "rant" about the problems of refcounting in
the multiarch spec? This sounds more suited for a page on wiki.d.o.

> Cross-grading
> -------------
> This can refer to either a package or the system.
> For the former, it means switching a package's architecture by installing
> a different instance over an already installed one. This only works for
> non Multi-Arch:same packages, as those would just get an additional instance
> installed instead.
> For the latter, this is the act of changing the native architecture. This
> is currently performed by installing a dpkg instance of the new architecture
> we want to switch to, with all the required dependencies.
> Command-line interfaces
> -----------------------
> On output, only packages with Multi-Arch:foreign with a non-native
> architecture or with Multi-Arch:same fields will ever get arch-qualified.
> For input, any command that accepts a package name, can always be passed an
> arch-qualified package name (pkgname or pkgname:arch). Arch-qualifying should
> in general always be a safe operation. Any command that accepts patterns will
> accept arch-qualified patterns too («<pkgname>:*» or «*:<archname>»), and
> an arch-unqualified pattern will default to an implicit «:*» arch-qualifier.
> Any command that requires a specific package name will require arch-qualified
> package name when there are multiple instances currently installed, to
> disambiguate them.

What about arch:all packages? It seems I'm allowed to arch-qualify them too.

> ### Problems
> There is a divergence of the CLI interface between dpkg and apt.
> ### Rationale
> * Backwards compatibility, a system with no enabled multiarch, no multiarch
>   enabled packages and no foreign packages installed should behave in the
>   same exact way (no arch-qualifiers printed etc.).
> * Following from the previous, callers that expected a single entry on output,
>   should not suddenly get multiple when specifying a single package name,
>   that's why those require specific arch-qualified package names.
> * The immediate output should be usable even after the system has been
>   cross-graded, so it should be resistant to native-arch switch.
> Out of scope
> ------------
> The following are implementation and/or distribution specific, and as the
> spec should ideally be distribution-neutral it should not encode packaging
> policy. Perhaps it should still be expanded as an implementation or examples
> sub-section, and marked as such.
> * TODO: Describe compiler and dpkg-shlibdeps search paths.
> * TODO: Packaging changes required to make a package multi-arch compliant;
>   lib, lib-dev, tool, etc.
> Unresolved problems
> -------------------
> * Interpreter problem.
>   https://wiki.debian.org/Multiarch/InterpreterProposal
>   https://lists.debian.org/debian-perl/2012/12/msg00000.html
> * Co-installable packages for executables.
>   One possible solution to this might be to use alternatives with priorities
>   determined dynamically at installation time.
> * Runnable architecture attribute.
>   Sometimes we need to know whether an architecture is runnable or not,
>   as this is relevant when deciding what to install into the system, and
>   even though this is of no concern to dpkg directly, it is for high-level
>   frontends and the user.
> * Partial architectures.
>   https://wiki.debian.org/Teams/Dpkg/Spec/FreestandingArches
> * Arch:all packages that can only be built in a specific arch.
>   https://wiki.debian.org/Teams/Dpkg/Spec/FreestandingArches
> * binNMU version skew.
>   See the “Reference counted files” section.

Can we have a better distinction between the "package" as part of a dependency
and the actual package that gets installed? If I write:

Depends: awk

Calling awk a "package" would be wrong. There is no such package. The string
"awk" is a dependency and not a package. The dependency gets satisfied by the
provider of the dependency which then is a package. This gets especially
confusing in the tables where you write "pkg:any" and without a bit of
concentration it's hard to remember whether you mean a dependency annotated
with :any or an arch:any package.

In dose3 we use the term vpkg for the terms in a dependency field. Does dpkg or
debian policy have a similar terminology that is not "package"?


cheers, josch

Attachment: signature.asc
Description: signature

Reply to: