[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: RFC on doc-multiarch-spec



Hi!

Thanks for the review! I've done a first pass, and integrated or fixed
some of things you suggested or pointed out. Will do another one after
that.

On Thu, 2022-11-17 at 14:05:52 +0100, Johannes Schauer Marin Rodrigues wrote:
> > Multiarch Specification
> > =======================
> > 
> > Status: implemented, stable
> > 
> > This specification is considered to be the canonical reference for multiarch,
> > but in case of discrepancies between this and the current implementation in
> > dpkg, the latter should be considered the expected behavior, unless it can
> > be argued that it is suboptimal and it can be easily changed.
> > 
> > Those discrepancies might come about because this document was rewritten
> > from scratch after the fact.
> 
> This document is about multiarch in dpkg. For multiarch in Debian, the
> canonical reference should be the (not yet existing) Debian policy write-up on
> multiarch. Should this document mention vendor specific policy like Debian
> policy?

Ack, I've updated this and added a generic reference to vendor
policies.

> > [
> >   TODO: Check whether anything is still missing and worth adding from:
> >   - https://wiki.debian.org/Multiarch/Tuples
> >   - https://wiki.debian.org/Multiarch/MissingRationale
> >   - https://wiki.debian.org/Teams/Dpkg/TimeTravelFixes
> > ]
> > 
> > Background
> > ----------
> > 
> > Make it possible to install packages for different architectures, with
> > support from the package manager. Make it possible to cross-build packages
> > for different architectures easily.
> 
> The second is a consequence from the first. By being able to install packages
> from different architectures, we make resolving cross build dependencies much
> easier. Maybe it should be formulated as such like:
> 
> Make it possible to install packages for different architectures, with support
> from the package manager. This allows, among other things:
> 
>  - support running 32 bit applications on 64 bit platforms that support this by
>    installing 32 bit shared libraries
>  - support cross build dependency resolution installing build architecture and
>    host architecture version of packages as required
>  - use completely foreign architecture binaries through qemu-user
>  - cross-grading a system from one architecture to another

Ack, thanks, much better. I've integrated this with slight formatting
and minor changes (such as using a generic description instead of a
specific program such as qemu).

> > There has been at least three previous ways to handle these needs. All of
> > which were rather unsatisfactory:
> > 
> >  * Installing foreign packages with «dpkg --force-architecture».
> > 
> >    This made it possible to install foreign packages, but was of very limited
> >    use, as the dependency relationships related to the architecture was
> >    nonexistent, and did not allow to express most of the most complex
> >    relationships.
> 
> I'd replace the last "most" with "more".

Done.

> >  * Using the multilib layout.
> > 
> >    This is the layout supported by many other distributions, to make it
> >    possible to install packages for the alternative runnable ABI for a
> >    specific architecture. But it has the fatal problem of not being a
> >    generalized approach, having inconsistent and confusing semantics for
> >    the multilib directories and requiring to hardcode the set of alternative
> >    ABIs supported for each main architecture. It installs into paths such
> >    as /usr/lib, /usr/lib32, /usr/lib64, where /usr/lib might or might not be
> >    the native architecture.
> > 
> >  * Using the sysroot layout.
> > 
> >    This is a more general solution than multilib, but it requires a
> >    pseudo-chroot equivalent for each architecture. It also pollutes the
> >    filesystem namespace as it installs into paths such as /usr/<sysroot>.
> > 
> > To be able to install packages from another architecture, we need to make
> > it possible for the package managers to tell what is and what is not allowed,
> > so that the dependency system does not get broken.
> > 
> > One recurring theme in the design of this specification was to allow for
> > incremental adoption (no flag days required), and to not break previous
> > satisfiability assumptions. New dependency types should be allowed, but
> > dependencies that were previously allowed should not stop working.
> > 
> > This would require changes in packaging, both in the filesystem layout
> > to make co-installability possible, and in the metadata to annotate the
> > packages and their dependencies depending on the interfaces provided.
> > 
> > Architecture type concepts
> > --------------------------
> > 
> > There are several important architecture types to take into consideration
> > with multiarch. We have the following different types:
> > 
> >  * <native>: Is the one the package manager (dpkg) has been built for, this
> >    architecture can change by way of cross-grading dpkg itself.
> >    «dpkg --print-architecture»
> 
> In the context of #1020533 we were discussing whether it makes sense whether
> dpkg should really always its own architecture being the native architecture,
> so this might change in the future.

I'd leave this for now as is, unless/until this changes.

> >  * <foreign>: This is a non-<native> architecture.
> >    «dpkg --print-foreign-architectures»
> > 
> >  * package architecture: The architecture of a package, which can be entirely
> >    different to the <native> architecture. From within maintainer scripts
> >    it can be fetched from the DPKG_MAINTSCRIPT_ARCH environment variable,
> >    and otherwise with «dpkg-deb -f <pkg>.deb Architecture».
> 
> Is "all" a package architecture or is the package architecture of a arch:all
> package implicitly the native architecture under this definition?

I think this is related to your point down below about explaining
somewhere about the semantics of "all". I'll try to see where to fit
this in for the next round.

> >  * dependency architecture: The architecture of the package in a dependency.
> >    Described in § "Dependency architecture inference".
> > 
> >  * <build-arch>: The architecture the package is built on, which should
> >    match <native>. Relevant when building packages.
> >    «dpkg-architecture -qDEB_BUILD_ARCH»
> 
> s/package/source package/
> 
> I think either always be implicit and call binary packages "packages" and
> source packages with the "source" prefix or always be explicit and prefix the
> term "package" with "binary" or "source" as appropriate.

I've changed this instance, but will have to go over all of them to
see whether they might need to be switched. I guess "package" alone
could also be interpreted as implying either source or binary.

> >  * <host-arch>: The architecture the packages is built for, determined
> >    explicitly from user input, or from the architecture the compiler
> >    generates code for. Relevant when building packages.
> >    «dpkg-architecture -qDEB_HOST_HOST»
> 
> Same as above.

Will deal with this in the next round.

> >  * <target-arch>: The architecture the compiler being built will build for,
> >    determined explicitly from user input, or otherwise <host-arch>.
> >    «dpkg-architecture -qDEB_TARGET_HOST»
> 
> We recently noted, that the term "target arch" might not only be useful for
> compilers but also for other software that outputs or interprets things
> specific to an architecture like emulators or virtual machines. But this just
> as a side-note.

I don't recall whether we talked about this at the time and perhaps
concluded it was too specific. In any case I've added the
emulation/virtualization case and tried to add a generalization of
that too, but I'm not sure whether that will be clear or make it more
confusing.

> > Multiarch Tuples
> > ----------------
> > 
> > The multiarch tuples are architecture strings that describe each different
> > architecture ABI. These are based on the GNU tuples, except that they get
> > normalized to their base form, ignoring any ISA specialization.
> > 
> > These are used as part of the filesystem layout to be able to co-install
> > packages that would otherwise have conflicting pathnames with different
> > contents.
> > 
> > ### Rationale
> > 
> > These tuples were introduced to get constant values, which was not the case
> > at least for the i386 dpkg architecture where the CPU part of the GNU tuple
> > has been getting bumped when the baseline ISA has been bumped.
> > 
> > ### Examples
> > 
> > This value can be fetched with «dpkg-architecture -qDEB_<type>_MULTIARCH».
> > 
> > Filesystem Layout
> > -----------------
> > 
> > The multiarch design is based on the concept that some kind of packages can
> > be co-installed. But these same packages would contain architecture-dependent
> > content that was previously exposed on the same pathname across architectures.
> > 
> > These architecture-dependent pathnames get relocated, as part of the
> > packaging, into multiarch tuple qualified pathnames. So if a shared library
> > used to be located at «<libdir>/libfoo.so.10», it would now be located at
> > «<libdir>/<multiarch-tuple>/libfoo.so.10».
> > 
> > For pathnames that provide the same content independently of the architecture
> > used to build and use them, the same pathname can still be used, as the
> > package manager will refcount them, as long as their digests match.
> 
> I think I know what you mean by "as long as their digests match" but maybe it
> is more clear to say "as long as they are identical"? Maybe in the end, it is
> indeed only the digest that needs to match but for practical purposes we want
> the contents to match. So the fact that the implementation chooses (I guess?)
> to compare digests isn't important here and the intention that the contents
> should be identical should be documented instead.

I've tried to reword this. I think the intention was to specify that
at least currently only file contents are compared (in contrast to
file metadata), which could be compared later, but might need a
transition.

> > ### Rationale
> > 
> >  * Allows to install multiple architectures.
> >  * It is a uniform namespace.
> >  * It is not limited to sibling or related architectures only diverging
> >    in bitness or ABI like multilib does.
> 
> Does it make sense to note in this section, that this co-installability is only
> intended for shared libraries in /usr/lib but not for executables in /usr/bin?

Will deal with this in the next round.

> > Package Interfaces
> > ------------------
> > 
> > A key concept in multiarch is the interfaces a package provides. This limits
> > how a package can be used by other packages, and when it can be installed.
> > 
> > There is an important distinction here between the interface being architecture
> > independent, and the interface being runnable from some architecture.
> > Runnability is of not great concern when it comes to the metadata annotations
> > in packages and dependencies. It is mainly of concern for the users and
> > frontends installing packages. Runnability is also a property that is not
> > made available simply by the current hardware architecture, using an emulator
> > can make an interface runnable.
> > 
> > When talking about interfaces, that refers to both passive (mostly files
> > and their pathnames) and active ones (shared libraries, programs, etc.).
> 
> Generally, I would avoid the use of "etc". Readers that do not know how to
> continue a list that is abbreviated with "etc" do not gain anything by it.
> Readers who do know how to continue the list do not either.

Ack, I've fixed this instance and the one below, but I'll need to go
over the others on the next round to see how to reword them.

> > For passive ones, the pathnames should not be arch-qualified, because then
> > locating them requires arch-specific knowledge. File formats should either
> > be arch-independent, or should make it possible to describe within all
> > possible different encodings, such as endianness, bitness, etc. But the
> > generation should select a single set of encoding and always generate the
> > same output.
> 
> What is "the generation" here?

The file generation, but I'm not sure whether that clarifies. I've
reworded it a bit.

> > Within active ones, there are two main sub-types, runnable and linkable.
> 
> If there are only two types, what does the "etc" above stand for?

I'm not sure I follow. Let me know if this still stands after the
current rewording though.

> > The common examples for these are programs (binaries or scripts) that one
> > runs, and shared libraries or architecture-specific modules or plugins
> > that one loads and links against. Runnable interfaces might be either
> > arch-dependent or independent depending on whether their output varies
> > per-architecture. It does not matter whether those runnable interfaces
> > are implemented in apparently arch-independent scripting languages for
> > example, as those can still be arch-dependent. Linkable interfaces are
> > always arch-dependent, as they are required to match the ABIs.
> 
> I would expand more here on what the interface of a program actually is. I
> think it's clear that the interface of a shared library is architecture
> dependent but for the interface of a program, it is a common problem and a
> common question whether the program can be marked multi-arch:foreign or not. My
> favourite example here is "make". 99% of the Makefiles out there probably use
> make in a way that would allow make being m-a:foreign. But the following
> snippet shows a Makefile that acts differently depending on the native
> architecture:
> 
>     all: -lc
>             @echo $(<)
> 
> Additionally, make is able to load shared libraries at runtime. I think the
> multiarch spec should expand on what an interface is a bit better and explain
> that to some extend, it is up to the maintainer what they deem the interface of
> a program. If the architecture-dependent parts are never used or not supposed
> to be used, it might as well be okay to mark something multi-arch:foreign.

Ack, thanks. Will try to improve/expand on this in the next round.

> This reminds me of another important question that pops up all the time which I
> think that this doc should explain somewhere:
> 
> Why would it be wrong to mark all arch:all packages as m-a:foreign?
> 
> The current version of this doc does not explain that arch:all packages are
> implicitly the native architecture. The text above implies that the "runnable
> program" can be arch:all and do arch-dependent stuff but i think this should be
> made more explicit as I found this to be a very common point of confusion.
> Essentially, what I'd like to be spelled out explicitly somewhere is:
> 
>  1. arch:all packages are implicitly of the native architecture
>  2. arch:all packages can ship scripts that are able to do architecture
>     dependent stuff, thus creating an architecture dependent interface
>  3. arch:all packages can depend on another package that makes it impossible
>     to declare it m-a:foreign
>  4. the above is the reason why arch:all packages cannot be assumed to be
>     implicitly m-a:foreign when satisfying cross-build dependencies

Ack, see my comment at the beginning about "all".

> > Control fields
> > --------------
> > 
> > ### The Multi-Arch field
> > 
> > This field will allow to satisfy dependencies between packages of
> > different architectures (beyond Architecture: all), and co-install
> > a package with the same name but different architecture.
> > 
> > The permitted values are:
> > 
> >   * “no”
> > 
> >     This value is equivalent to the current default, that being the omission
> >     of the field.
> > 
> >     The interfaces provided by this package are unknown. This means the
> >     package has either not been yet made multiarch aware, or in some rare
> >     situations when none of the other values currently fit, and has been
> >     marked explicitly as having been evaluated.
> 
> Why do you write that it is rare that none of the other values fit? I think
> most architecture dependent programs fit none of the other values.

Ok, I've reworded this a bit now.

> >   * “same“
> > 
> >     This package is co-installable with itself (other architecture instances),
> >     but it must not be used to satisfy the dependency of any package of a
> >     different architecture from itself.
> > 
> >     The main purpose of this value is to mark packages that provide
> >     architecture-dependent linkable interfaces. In special circumstances it
> >     can also be used to provide runnable interfaces where each program or
> >     script filename is arch-qualified.
> > 
> >   * “foreign”
> > 
> >     The package is not co-installable with itself, but should be allowed
> >     to satisfy the dependencies of a package of a different architecture
> >     from itself.
> > 
> >     The main purpose of this value is to mark packages that provide
> >     architecture-independent interfaces, such as data files, programs
> >     with architecture-independent behavior (even if the program is compiled
> >     and architecture-specific), scripting language modules, etc.
> 
> I think adding "scripting language modules" here is a bit dangerous because of
> the m-a interpreter problem.

Hmm right, will try to see how to express this better for the next
round.

> >   * “allowed”
> > 
> >     This permits the reverse-dependencies of the package to annotate their
> >     dependency field to indicate that a foreign architecture version of the
> >     package satisfies the dependencies, but does not change the resolution
> >     of any existing dependencies.
> > 
> >     The main purpose of this value is to mark packages that have a dual
> >     role, either as runnable (architecture-independent) or linkable
> >     (architecture-dependent) depending on how the depending package uses
> >     those interfaces. As that knowledge lies in the depending package,
> >     the responsibility to denote that type of interface usage falls on
> >     those dependencies, through arch-qualifiers. This value enables those
> >     «:any» arch-qualifiers to be taken into account, as to not let such
> >     wildcards be declared without cooperation and agreement from the package
> >     providing those interfaces.
> 
> Another important purpose of "allowed" is for packages providing a runnable
> program that can be either used in an architecture dependent or independent
> way.

Ack, will see how to cover this also on the next round.

> > Dependency resolution
> > ---------------------
> > 
> > Dependency resolution has two main parts, run-time and build-time.
> > 
> > Packages in dependencies can be annotated with arch-qualifiers. These
> > are suffixed to the package name after a colon (':'), and consist of
> > one of several special strings such as 'any', 'native', or an actual
> > architecture name. These arch-qualifiers will restrict which packages
> > can satisfy these dependencies.
> > 
> > Because Essential:yes is not intended for shared library packages, it is
> > assumed that any implicit dependency on an essential package is satisfied
> > by the binaries from the native architecture.
> > 
> > ### Dependency architecture inference
> > 
> > Dependencies always contain architecture information, be it implicit or
> > explicit with arch-qualifiers. This information is used in various places
> > as part of the dependency satisfiability checks. The following table
> > describes how the dependency architectures from a package get determined
> > given the package architecture.
> > 
> >       \  Pkg arch |
> >   Dep  \          | all           <pkg-arch>
> >   ----------------+----------------------------
> >   pkg¹            | <native>/any  <pkg-arch>/any
> >   pkg:<dep-arch>  | <dep-arch>    <dep-arch>
> >   pkg:any         | any           any
> 
> I do not understand the /any in the pkg¹ row. What does it mean?

This means that it is either <native> or "any" depending on the
conditions in ¹. Perhaps writing something like <native-or-any> would
make this more clear?

> > [¹]
> >   * For Pre-Depends/Depends/Recommends/Suggests/Enhances/Provides, the
> >     implicit arch-qualifier is <native> for arch 'all' packages, or <pkg-arch>.
> 
> ..or <pkg-arch> for arch 'any' packages.

This would be the column header so it would be "or <pkg-arch> for
arch <pkg-arch> packages" which seems redundant. I guess this needs a
better representation (perhaps splitting the tables for different
fields) or better textual explanation, will ponder about it.

> >   * For Conflicts/Breaks/Replaces, the implicit arch-qualifier is 'any'.
> >   * [ TODO: Document build-time dependency fields. ]
> > 
> > ### Run-time satisfiability
> > 
> > The first is the usual run-time dependency resolution when installing
> > packages on the system for their normal use, while using Pre-Depends,
> > Depends, Conflicts, Breaks, Replaces, Provides. This also applies to
> > Recommends, Suggests and Enhances, but as those are not strict
> > requirements, their semantics depend on how the frontend honors the
> > fields.
> > 
> > This type of dependency is concerned with the architecture of the package
> > being installed, and the architectures of its dependencies.
> > 
> >       \  M-A |
> >   Dep  \     | no          same        foreign     allowed
> >   -----------+-----------------------------------------------
> >   pkg        | <dep-arch>  <dep-arch>  any         <dep-arch>
> >   pkg:<arch> | <dep-arch>  <dep-arch>  <dep-arch>  <dep-arch>
> >   pkg:any    | <dep-arch>  <dep-arch>  <dep-arch>  any
> 
> Why is a pkg:<arch> dependency on a m-a:foreign package only satisfied by
> <dep-arch>? The m-a:foreign package (as described above) "satisfies the
> dependencies of a package of a different architecture from itself." If it does
> that, then it doesn't make sense that, then why does foo:i386 not satisfy a
> dependency on foo:amd64? If foo:i386 cannot satisfy that dependency (and that's
> why the other package explicitly stated foo:amd64) then it shouldn't be
> m-a:foreign.

I'll try to rework the tables and recheck these, and will come back
to you on these.

> > The pkg:any dependency only being satisfied with M-A:allowed was added in
> > part so that packages could not start declaring wildcard relationships
> > without cooperation and agreement from the packages providing such
> > interfaces, because the semantics of these interfaces might not be clear to
> > external parties.
> > 
> > [ TODO: Document that pkg:any is only satisfied for non M-A:allowed with
> >   Conflicts/Breaks/Replaces fields. ]
> 
> There should probably be two tables then? It also confused me that the pkg:any
> row has these <dep-arch> entries instead of saying "disallowed".

Will also come back to you on this after a potential table rework.

> > ### Build-time satisfiability
> > 
> > The other applies
> 
> The other what?

I've reworded this now.

> > while satisfying build-time dependencies while using the
> > fields Build-Depends, Build-Conflicts, Build-Depends-Arch,
> > Build-Conflicts-Arch, Build-Depends-Indep, Build-Conflicts-Indep. These are
> > concerned with source packages, so we do not have any architecture information
> > from that.
> > 
> > In this mode of satisfiability, a new concept to take into account is the
> > distinction between build, host and target architectures, which are the only
> > architectures we will have knowledge of.
> 
> This concept is not really new as it was mentioned above.

I think this was intended to mean new not within the document section,
but from previous practice/understanding. But I can see how this might
seem more confusing than helpful, will think how to improve the
wording here.

> > 
> >       \  M-A |
> >   Dep  \     | no            same          foreign             allowed
> >   -----------+----------------------------------------------------------
> >   pkg        | <host-arch>   <host-arch>   any (<build-arch>)  <host-arch>
> >   pkg:<arch> | <host-arch>   <host-arch>   any (<build-arch>)  <host-arch>
> >   pkg:any    | disallowed    disallowed    disallowed          any (<build-arch>)
> >   pkg:native | <build-arch>  <build-arch>  disallowed          <build-arch>
> > 
> >   pkg:target | N/A ...
> > 
> > With «any (<type-arch>)» meaning that while any architecture would do, the
> > preferred one is <type-arch>.
> > 
> > The build-time satisfiability includes disallowed relationships because
> > these help detect nonsensical relationships. This difference compared
> > with the run-time behavior is because it tends to be easier to modify
> > the source once you have it around.
> > 
> > The pkg:any with anything that is not M-A:allowed relationship is disallowed
> > because the requested relationship is not getting respected.
> > 
> > The pkg:native with M-A:foreign relationship is disallowed because that
> > indicates either (or both) markings is in error. Either the interface is
> > arch-dependent and thus can be requested to be pkg:native, or it is
> > arch-independent and the target can be provided as foreign.
> 
> That's the same argument for pkg:native to m-a:foreign as i made above for
> pkg:any to m-a:foreign.

Will also come back to you on this after a potential table rework.

> > [ TODO: Document discrepancies and their rationale for difference in
> >   satisfiability for pkg:any, and for not honoring the distinction between
> >   Build-Depends and Build-Conflicts like with run-time deps. ]
> > 
> > Reference counted files
> > -----------------------
> > 
> > File reference counting is an operation that dpkg performs for
> > Multi-Arch:same packages, so that files that would otherwise conflict,
> > can be shared between different architecture instances and do not need
> > to be split into common packages.
> > 
> > A ref-counted file is one that is owned by multiple arch-instances of
> > a Multi-Arch:same package. The current requirements are:
> > 
> >  * Multi-Arch:same packages can only be configured if all of their instances
> >    are unpacked at their exact same binary version.
> >  * All ref-counted files need to match on their md5sums.
> > 
> > Maintainer scripts can fetch the package ref-counter from the environment
> > variable DPKG_MAINTSCRIPT_PACKAGE_REFCOUNT.
> > 
> > ### Rationale
> > 
> > * Requires less package splits, and thus less package metadata and less
> >   maintainer work.
> > * Can avoid disk duplication, as the contents for the same package files
> >   get shared between different instances.
> > 
> > ### Problems
> > 
> > Even though file ref-counting has some nice properties to avoid work for
> > maintainers, it is really broken by design as it has also some very bad
> > properties.
> > Some are even in principle unfixable. In addition backpedaling on that
> > decision would imply quite some work now. Given the requirements above:
> > 
> >  * It cannot guarantee that the generated files will be bit identical if
> >    they have not been generated with the same build-dependencies as the
> >    other instances.
> >  * It also introduced the requirement that packages need to be installed
> >    in version lock-step, which complicates upgrades, and makes packages
> >    uninstallable when one of the instances is not yet available.
> >  * Makes the maintainer script semantics more complicated.
> >  * Unmatched binNMUs make packages not co-installable, due to version-skew.
> >  * binNMUs in general are by default not co-installable, due to differing
> >    changelog entries.
> >  * Only the last package instance can check that it matches the md5sums
> >    of the already installed ref-counted files, which means differing files
> >    might not get detected.
> >  * Essential packages, which must work even when only unpacked, might not
> >    work at all if one of its Pre-Depends is a M-A:same shared library that
> >    has an unpacked shared file from another instance from a different binary
> >    version.
> > 
> > The currently implemented and proposed workaround to some of this problems
> > has been a series of ad-hoc hacks:
> > 
> >  * Split the binNMU changelog entry into a different file, automatically
> >    only for packages using debhelper.
> >  * Hunt down all packages that contain differences depending on the
> >    architecture, and try to make them reproducible, but this might just
> >    shadow files that might end up changing depending on the program
> >    generating them.
> >  * (postponed) Switch the binary version coherence check for all instances
> >    to be source version based. This mixes up the source and binary
> >    versionspaces, and makes it akin to a magic check.
> > 
> > Ideally:
> > 
> >  * To avoid a flag day we could add a new Multi-Arch field value, with
> >    similar semantics as «same» but implying no ref-counting.
> >  * Split ref-counted files into their own common packages.
> >  * Move at least changelog files into the .deb control area, and consequently
> >    to the dpkg db.
> >    - This would also allow to transparently compress and deduplicate those
> >      files, w/o needing to do flaky directory to symlink dances back and forth.
> >  * At some point in the future, when not needed at all, disable ref-counting
> >    completely, or via a --force flag? (Breaks compatibility and might not be
> >    possible at all, ever.)
> 
> Is it really helpful to have this "rant" about the problems of refcounting in
> the multiarch spec? This sounds more suited for a page on wiki.d.o.

That was my concern too when going over this on my last review and did the
informal call for comments. It seems to get in the way of understanding
the current implementation. I guess I added in part to try to collect
as much Multiarch knowledge into a single place, and the potential
problems seemed relevant. Perhaps I should just leave some of the
problems, and move the workarounds and potential alternative solutions
in the wiki? Perhaps try to trim the problems too, for things that might
affect maintainers. Will see how to handle this for the next round.

> > Cross-grading
> > -------------
> > 
> > This can refer to either a package or the system.
> > 
> > For the former, it means switching a package's architecture by installing
> > a different instance over an already installed one. This only works for
> > non Multi-Arch:same packages, as those would just get an additional instance
> > installed instead.
> > 
> > For the latter, this is the act of changing the native architecture. This
> > is currently performed by installing a dpkg instance of the new architecture
> > we want to switch to, with all the required dependencies.
> > 
> > Command-line interfaces
> > -----------------------
> > 
> > On output, only packages with Multi-Arch:foreign with a non-native
> > architecture or with Multi-Arch:same fields will ever get arch-qualified.
> > 
> > For input, any command that accepts a package name, can always be passed an
> > arch-qualified package name (pkgname or pkgname:arch). Arch-qualifying should
> > in general always be a safe operation. Any command that accepts patterns will
> > accept arch-qualified patterns too («<pkgname>:*» or «*:<archname>»), and
> > an arch-unqualified pattern will default to an implicit «:*» arch-qualifier.
> > Any command that requires a specific package name will require arch-qualified
> > package name when there are multiple instances currently installed, to
> > disambiguate them.
> 
> What about arch:all packages? It seems I'm allowed to arch-qualify them too.

I'll queue this with the other "all" improvements.

> > ### Problems
> > 
> > There is a divergence of the CLI interface between dpkg and apt.
> > 
> > ### Rationale
> > 
> > * Backwards compatibility, a system with no enabled multiarch, no multiarch
> >   enabled packages and no foreign packages installed should behave in the
> >   same exact way (no arch-qualifiers printed etc.).
> > * Following from the previous, callers that expected a single entry on output,
> >   should not suddenly get multiple when specifying a single package name,
> >   that's why those require specific arch-qualified package names.
> > * The immediate output should be usable even after the system has been
> >   cross-graded, so it should be resistant to native-arch switch.
> > 
> > Out of scope
> > ------------
> > 
> > The following are implementation and/or distribution specific, and as the
> > spec should ideally be distribution-neutral it should not encode packaging
> > policy. Perhaps it should still be expanded as an implementation or examples
> > sub-section, and marked as such.
> > 
> > * TODO: Describe compiler and dpkg-shlibdeps search paths.
> > 
> > * TODO: Packaging changes required to make a package multi-arch compliant;
> >   lib, lib-dev, tool, etc.
> > 
> > Unresolved problems
> > -------------------
> > 
> > * Interpreter problem.
> > 
> >   https://wiki.debian.org/Multiarch/InterpreterProposal
> >   https://lists.debian.org/debian-perl/2012/12/msg00000.html
> > 
> > * Co-installable packages for executables.
> > 
> >   One possible solution to this might be to use alternatives with priorities
> >   determined dynamically at installation time.
> > 
> > * Runnable architecture attribute.
> > 
> >   Sometimes we need to know whether an architecture is runnable or not,
> >   as this is relevant when deciding what to install into the system, and
> >   even though this is of no concern to dpkg directly, it is for high-level
> >   frontends and the user.
> > 
> > * Partial architectures.
> > 
> >   https://wiki.debian.org/Teams/Dpkg/Spec/FreestandingArches
> > 
> > * Arch:all packages that can only be built in a specific arch.
> > 
> >   https://wiki.debian.org/Teams/Dpkg/Spec/FreestandingArches
> > 
> > * binNMU version skew.
> > 
> >   See the “Reference counted files” section.

> Can we have a better distinction between the "package" as part of a dependency
> and the actual package that gets installed? If I write:
> 
> Depends: awk
> 
> Calling awk a "package" would be wrong. There is no such package. The string
> "awk" is a dependency and not a package. The dependency gets satisfied by the
> provider of the dependency which then is a package. This gets especially
> confusing in the tables where you write "pkg:any" and without a bit of
> concentration it's hard to remember whether you mean a dependency annotated
> with :any or an arch:any package.
> 
> In dose3 we use the term vpkg for the terms in a dependency field. Does dpkg or
> debian policy have a similar terminology that is not "package"?

I guess this is biased in a dpkg-centric way, where all "packages"
found, be them real instances from the status file, or from any
dependency field, are considered as packages. Whether a package is
virtual (pure or not) depends only on what "packages" are listed in
Provides fields. dpkg internally distinguishes packages as being
"informative" or not.

But will try to see how to make this more clear.

I'm attaching the incremental quick revision I did on top of the
current branch.

Thanks,
Guillem
From 70720d1f1414e9c319b89fd83b6ea2cd0f407120 Mon Sep 17 00:00:00 2001
From: Guillem Jover <guillem@debian.org>
Date: Fri, 18 Nov 2022 00:03:02 +0100
Subject: [PATCH] fixup! doc: Write down the multiarch specification

---
 doc/spec/multiarch.txt | 71 ++++++++++++++++++++++++------------------
 1 file changed, 41 insertions(+), 30 deletions(-)

diff --git a/doc/spec/multiarch.txt b/doc/spec/multiarch.txt
index 6fa7ec8c5..4c7ccbc6d 100644
--- a/doc/spec/multiarch.txt
+++ b/doc/spec/multiarch.txt
@@ -3,13 +3,13 @@ Multiarch Specification
 
 Status: implemented, stable
 
-This specification is considered to be the canonical reference for multiarch,
-but in case of discrepancies between this and the current implementation in
-dpkg, the latter should be considered the expected behavior, unless it can
-be argued that it is suboptimal and it can be easily changed.
-
-Those discrepancies might come about because this document was rewritten
-from scratch after the fact.
+This specification is considered to be the canonical reference for multiarch
+in dpkg, but in case of discrepancies between this specification and the
+current implementation in dpkg, the latter should be considered the expected
+behavior, unless it can be argued that it is suboptimal and it can be easily
+changed. Any such discrepancy might come about because this document was
+rewritten from scratch after the fact. Particular vendors might have
+additional policies and restrictions on top of this specification.
 
 [
   TODO: Check whether anything is still missing and worth adding from:
@@ -22,8 +22,14 @@ Background
 ----------
 
 Make it possible to install packages for different architectures, with
-support from the package manager. Make it possible to cross-build packages
-for different architectures easily.
+support from the package manager. This allows, among other things:
+
+ - Support running 32-bit applications on 64-bit platforms that support this
+   by installing 32-bit shared libraries.
+ - Support cross build dependency resolution, installing build architecture
+   and host architecture version of packages as required.
+ - Use completely foreign architecture binaries through CPU emulation.
+ - Cross-grading a system from one architecture to another.
 
 There has been at least three previous ways to handle these needs. All of
 which were rather unsatisfactory:
@@ -32,7 +38,7 @@ which were rather unsatisfactory:
 
    This made it possible to install foreign packages, but was of very limited
    use, as the dependency relationships related to the architecture was
-   nonexistent, and did not allow to express most of the most complex
+   nonexistent, and did not allow to express most of the more complex
    relationships.
 
  * Using the multilib layout.
@@ -73,10 +79,10 @@ with multiarch. We have the following different types:
 
  * <native>: Is the one the package manager (dpkg) has been built for, this
    architecture can change by way of cross-grading dpkg itself.
-   «dpkg --print-architecture»
+   Get with «dpkg --print-architecture».
 
  * <foreign>: This is a non-<native> architecture.
-   «dpkg --print-foreign-architectures»
+   Get with «dpkg --print-foreign-architectures».
 
  * package architecture: The architecture of a package, which can be entirely
    different to the <native> architecture. From within maintainer scripts
@@ -86,18 +92,21 @@ with multiarch. We have the following different types:
  * dependency architecture: The architecture of the package in a dependency.
    Described in § "Dependency architecture inference".
 
- * <build-arch>: The architecture the package is built on, which should
-   match <native>. Relevant when building packages.
-   «dpkg-architecture -qDEB_BUILD_ARCH»
+ * <build-arch>: The architecture the source package is built on, which
+   should match <native>. Relevant when building packages.
+   Get with «dpkg-architecture -qDEB_BUILD_ARCH».
 
- * <host-arch>: The architecture the packages is built for, determined
+ * <host-arch>: The architecture the source packages is built for, determined
    explicitly from user input, or from the architecture the compiler
    generates code for. Relevant when building packages.
-   «dpkg-architecture -qDEB_HOST_HOST»
+   Get with «dpkg-architecture -qDEB_HOST_HOST».
 
  * <target-arch>: The architecture the compiler being built will build for,
-   determined explicitly from user input, or otherwise <host-arch>.
-   «dpkg-architecture -qDEB_TARGET_HOST»
+   the architecture an emulator or virtual machine will execute code for,
+   but more generally the architecture for architecture-specific inputs and
+   outputs for the programs being built, determined explicitly from user
+   input, or otherwise <host-arch>.
+   Get with «dpkg-architecture -qDEB_TARGET_HOST».
 
 Multiarch Tuples
 ----------------
@@ -134,7 +143,8 @@ used to be located at «<libdir>/libfoo.so.10», it would now be located at
 
 For pathnames that provide the same content independently of the architecture
 used to build and use them, the same pathname can still be used, as the
-package manager will refcount them, as long as their digests match.
+package manager will refcount them, as long as these filesystem objects are
+identical (currently that means only their contents).
 
 ### Rationale
 
@@ -158,14 +168,15 @@ made available simply by the current hardware architecture, using an emulator
 can make an interface runnable.
 
 When talking about interfaces, that refers to both passive (mostly files
-and their pathnames) and active ones (shared libraries, programs, etc.).
+and their pathnames) and active ones (shared libraries, plugins, programs,
+and other executable code).
 
 For passive ones, the pathnames should not be arch-qualified, because then
 locating them requires arch-specific knowledge. File formats should either
 be arch-independent, or should make it possible to describe within all
-possible different encodings, such as endianness, bitness, etc. But the
-generation should select a single set of encoding and always generate the
-same output.
+possible different encodings, such as endianness, bitness, alignment, and
+any other property that can vary per architecture. But the file generation
+should select a single set of encoding and always generate the same output.
 
 Within active ones, there are two main sub-types, runnable and linkable.
 The common examples for these are programs (binaries or scripts) that one
@@ -194,9 +205,9 @@ The permitted values are:
     of the field.
 
     The interfaces provided by this package are unknown. This means the
-    package has either not been yet made multiarch aware, or in some rare
-    situations when none of the other values currently fit, and has been
-    marked explicitly as having been evaluated.
+    package has either not been yet made multiarch aware, or none of the
+    values currently fit, and has been marked explicitly as having been
+    evaluated.
 
   * “same“
 
@@ -275,7 +286,7 @@ given the package architecture.
 
 ### Run-time satisfiability
 
-The first is the usual run-time dependency resolution when installing
+The first part is the usual run-time dependency resolution when installing
 packages on the system for their normal use, while using Pre-Depends,
 Depends, Conflicts, Breaks, Replaces, Provides. This also applies to
 Recommends, Suggests and Enhances, but as those are not strict
@@ -303,8 +314,8 @@ parties.
 
 ### Build-time satisfiability
 
-The other applies while satisfying build-time dependencies while using the
-fields Build-Depends, Build-Conflicts, Build-Depends-Arch,
+The other part applies while satisfying build-time dependencies while using
+the fields Build-Depends, Build-Conflicts, Build-Depends-Arch,
 Build-Conflicts-Arch, Build-Depends-Indep, Build-Conflicts-Indep. These are
 concerned with source packages, so we do not have any architecture information
 from that.
-- 
2.38.1


Reply to: