[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: merged-/usr transition: debconf or not?



Russ Allbery <rra@debian.org> writes:

> Well, bootstrapping a new Debian system involves running a tool that
> bootstraps a new Debian system.  I think you're constraining the problem
> too much.

> It's a nice property that everything on the system comes straight from a
> Debian package, but it's not a strict requirement, nor is it currently
> generally the case (maintainer scripts do things, for instance).  Those
> symlinks are very special from a dpkg perspective; dpkg needs to refuse
> to mess with them even when upgrading a package that provides them,
> which would mean some irritating special-casing I suspect.  It's not
> clear to me that shipping them as tar members of a package is the right
> way to go, as opposed to creating them separately as part of early
> system configuration.

Having slept on this, let me make this problem concrete.

I'm going to make the following assumptions:

* We have some mechanism to put dpkg into what I've been calling
  merged-/usr mode.  In this mode, it pre-filters all input paths from
  whatever source (including arguments to dpkg-divert,
  update-alternatives, etc.)  and canonicalizes them in a very specific
  way: the directories that become symlinks in merged /usr are replaced in
  each path with their canonical paths.  So /bin/ls becomes /usr/bin/ls,
  /lib64/ld-linux-x86-64.so.2 becomes /usr/lib64/ld-linux-x86-64.so.2, and
  so forth.

* When bootstrapping a new Debian system, we want to put dpkg into
  merged-/usr mode as early as possible.

* If dpkg is in merged-/usr mode, the first thing it does is checks the
  file system on which its operating and ensures that the expected
  symlinks already exist.  If they do not, it aborts because operating on
  that file system is unsafe.  (Leaving aside for the moment whether there
  should be some -force option, etc.)

This produces a bootstrapping problem: all dynamic binaries on, say, an
amd64 system refer to /lib64/ld-linux-x86-64.so.2 [1].  Therefore, before
the first binary is run from within the context of the newly-installed
system, either that path must exist as-is (which we don't want because we
want to create a merged-/usr system where it belongs in /usr/lib64) or the
symlink from /lib64 to /usr/lib64 must already exist.

I think it's fairly obvious that we don't want a long-term design in which
the libc6 package has to continue to ship /lib64/ld-linux-x86-64.so.2, we
originally unpack that file in that path during bootstrap, and then
something has to come along later and move it to /usr/lib64 and create a
symlink.  This has numerous drawbacks: weird special cases people have to
remember, not being able to reconcile the contents of packages with their
canonical paths in the long run, a window where we have to do file system
surgery atomically, etc.  Instead, we want to live in a world in which
libc6 can ship /usr/lib64/ld-linux-x86-64.so.2, dpkg sees it as shipping
that path, but everything works fine during bootstrap because the /lib64
to /usr/lib64 symlink is already established before we have to execute
binaries in the new environment.  (Obviously it may be some time before we
actually change the contents of the libc6 package; that's fine, the point
of adding a path filter to dpkg is that we can take our time on that.)

So, assuming we have a libc6 package that contains either
/usr/lib64/ld-linux-x86-64.so.2 or /lib64/ld-linux-x86-64.so.2, how do we
bootstrap this system?

I can see a few approaches:

* Put the required symlinks into base-files.  This preserves the nice
  property that every file on the system comes from and is registered as
  belonging to a Debian package (which we don't fully meet but which we
  always aspire to).  I haven't studied all the various tools for
  bootstrapping a Debian system, but I presume that unpacking base-files
  is the first thing that any of them do.  This will therefore lay down
  the symlinks from the start and it won't matter what path under which
  the libc6 package ships the files if it's extracted over the resulting
  file system with ar+tar.

  The drawback here is that dpkg is going to rewrite all paths like /lib64
  to /usr/lib64, which would naively *also* apply to the base-files
  package when it looks at that package, but that can't be allowed because
  now we're back to the situation where dpkg's state database is
  inconsistent with the file system and dpkg thinks that base-files
  contains some nonsensical /usr/lib64 to /usr/lib64 symlink.

  I think in this approach there would need to be some special-case code
  directly in dpkg that recognizes the usrmerge symlinks (and only
  specifically those symlinks) as special and preserves them as-is in the
  installed package database.  (I think it's probably better to
  special-case the specific symlink than to special-case base-files the
  package.)  We will then need rules that base-files must always contain
  those symlinks and they can't move between packages, etc., (or the
  system is likely to break horribly), but "base-files cannot do weird
  things" is probably already a constraint that we have.

  Another big drawback of this approach is that now we have to handle
  upgrades of the base-files package from older Debian installs very
  carefully.  base-files should *not* be doing the usrmerge transition, so
  somehow we have to arrange for that transition to happen before
  base-files is upgraded.  This may be doable, but feels rather messy;
  it's setting off my complexity radar.

* Create a new essential package that contains these symlinks and that
  needs to be unpacked before any binaries are executed in the target file
  system.  This has many of the advantages and drawbacks of the approach
  of putting the symlinks in base-files, but may make it easier to handle
  the upgrade problem.  Ideally an upgrade would then involve installing
  usrmerge, letting it run, and then installing this new essential package
  so that it takes over ownership of those symlinks.

  This still feels kind of complex and messy to me, but maybe less so.

* Create the symlinks directly in the bootstrapping script.  In other
  words, after unpacking base-files, the bootstrapping script would
  directly create the required symlinks in the target file system, before
  unpacking any other package.

  This has the obvious drawback of moving things outside the packaging
  system and creating a new special case that every bootstrapping script
  needs to be aware of (and I know we have at least four or five that
  would need modifications).  It has the advantage that the usrmerge
  symlinks are now not in the dpkg database and thus not subject to
  rewriting, and therefore won't need to be special-cased.  However, that
  comes with the obvious disadvantage that they're not in the dpkg
  database, so for instance dpkg -S /lib wouldn't find that symlink unless
  it was added as some sort of dpkg-query special case (which doesn't seem
  like a great idea).

  The advantage of this approach is that it closely mimicks what's already
  happening now with the usrmerge package, and for which we therefore have
  a lot of existing experience.

Presumably Fedora had the same problem with their bootstrapping and the
RPM database.  Does anyone know how they handled it?

I realized when writing this that I've not previously dove into the
details of the bootstrapping process and therefore don't know how the
hand-off between packages manually installed with ar+tar to using dpkg in
the target file system is handled.  Are all the essential packages
re-installed with dpkg again so that dpkg will have touched every package
including base-files?  Or does something write out the dpkg database files
in /var/lib/dpkg/info for the manually-installed packages so that dpkg is
already aware of them the first time it runs?

If it's the latter, we're going to either need the same path rewriting in
the process that writes /var/lib/dpkg/info, or we're going to need an
explicit dpkg bootstrapping step the first time it runs, since some of
those packages are going to ship files in /lib, etc., in their ar+tar
representation but we want the database to show all of those files as
installed in /usr/lib, etc. (with the exception of the /lib symlink as
discussed above).  That presumably means running the dpkg
convert-to-merged-/usr code during bootstrap to convert the database.

If it's the former, presumably we can just configure dpkg in merged-/usr
before the first time it runs and then everything will be correct from the
start, but we do have to figure out what piece of the bootstrapping
process installs the necessary dpkg configuration saying to operate in
merged-/usr mode.  For the first two options above, putting it in
base-files or in the new package that holds the usrmerge symlinks seems
like an obvious choice (which would argue for making the merged-/usr dpkg
configuration file a separate file, so that those packages can provide it
without stomping on dpkg configuration that should be owned by the dpkg
package), but again we have to carefully handle upgrades.  For the third
approach, it probably makes sense for the bootstrapping script to drop the
configuration file in directly (and then, in the long run, use the
oft-proposed but not-yet-implemented mechanism for packages to register
their configuration files with dpkg that aren't shipped with the package
so that the configuration file eventually ends up in the dpkg database as
owned by dpkg).

[1] amd64 used as an example; every architecture has some version of this
    same problem.  I'm also using the dynamic loader as an example because
    I think it makes the problem most obvious and concrete, but I suspect
    there are other critical paths of this type that pose similar issues.
    Note that the exact list of paths that have to be symlinked is
    architecture-dependent (see directories_to_merge in convert-usrmerge).

-- 
Russ Allbery (rra@debian.org)              <https://www.eyrie.org/~eagle/>


Reply to: