Re: multiarch support and dpkg 2.0 design document

To: Scott James Remnant <scott@canonical.com>
Cc: debian-dpkg@lists.debian.org, Erik Troan <ewt@rpath.com>, "Michael K. Johnson" <johnsonm@rpath.com>
Subject: Re: multiarch support and dpkg 2.0 design document
From: Matt Wilson <msw@rpath.com>
Date: Thu, 18 May 2006 16:49:17 -0400
Message-id: <[🔎] 20060518204916.GA23519@rpath.com>
In-reply-to: <[🔎] 1147623569.11368.6.camel@quest.netsplit.com>
References: <[🔎] 20060512040224.GE4592@djpigpb.djpig.de> <[🔎] 1147623569.11368.6.camel@quest.netsplit.com>
On Sun, May 14, 2006 at 05:19:29PM +0100, Scott James Remnant wrote:
> 
> The PDF is the first time I've ever sat down and wrote, in one document,
> what I've been thinking about for the last couple of years.  While I
> think it's pretty neat, I'm hoping others will be able to find holes or
> problems with it -- or improvements they can make.

This is very exciting.  From your introduction in dpkg2.pdf, it sounds
like you have come to the same conclusions we have about the
first-generation package managers (dpkg and RPM): they served us well,
but it's time to move to something new.  Or maybe it was too many
years of hearing how "APT (sic) is better than RPM."

Erik Troan (who together with Marc Ewing created RPM), Michael
K. Johnson, and I set out to redesign a next generation software
manager in February 2004.  We took the lessons learned from our
experience at Red Hat developing many distributions using RPM, new
philosophies from groups like Gentoo on build flexibility, and fresh
ideas on building a distributed system.  Many of the concepts you have
in the design paper are things we have implemented.

This is a good sign: two projects from different background and
approaches seem to be converging on good ideas.

Our work turned into Conary - a distributed software configuration
management system.  There's some information at http://wiki.conary.com/.
Some (somewhat dense) reading material can be found at
http://wiki.conary.com/DocumentsAndPresentations and
http://wiki.conary.com/ConaryPresentations.  The OLS papers are
probably the most helpful.

Enough background about Conary; let me share some thoughts from
reading dpkg2.pdf.
__

On Source and Binary Formats: definitely decouple the build mechanism
from the package manager itself.  You should be able to use files in
debian/, .spec files, .ebuild files, etc. to drive the process of
taking sources and turning them into binaries.  Conary takes this
approach as well, though no backends other than our own .recipe format
have been implemented.  While splitting source building from package
management is important, we think that managing sources and binaries
in a unified system is important.  With Conary, you always have the
sources available, they're versioned using the same version tree as
the binaries, and you can always reproduce any particular binary from
the sources since they're managed the same way.

We did not use an existing tar or cpio-like archive for our on-disk
format.  Conary manages the system by applying changesets to it.
Since these changesets are relative to what's already installed on the
system, existing archive formats don't make sense.  We do use SHA1 to
check the integrity of the file contents and OpenPGP signatures on the
metadata.
__

On Atomic Operation:  I'm interested to know how you're
planning to do one atomic operation on the filesystem to move a
package from being staged to being installed.  As you stage new file
contents to disk, possibly writing them along side existing files, you
have to rename() each one individually.  There's no way to do this in
one atomic operation (that I know of) without help from the OS.

In Conary, we take any changeset that is applied to the system and
reverse it.  We store the reversed changeset as a "rollback".
Reverting an operation simply applies the rollback changeset.
__

On Focus on Installed Packages: I see no need to keep a record
of available packages in the package manger.  I do see a need to keep
at least a log of removed packages on the system.
__

On Unpacking: The approach you've outlined is very similar to
how things work in Conary.  We don't currently support registering
non-packaged files' metadata.
__

On Filters: This is a neat idea.  We don't have anything exactly like
it.  I think that the usefulness will depend on what metadata is made
available to the filter.
__

On Classes: Though the mechanism you've described is somewhat
different than what we've implemented, Conary implements this.  We
call them "tags".  Files are tagged to be of a certain class, or type.
For example, we have an "initscript" tag.  The system has a "tag
handler" that knows what to do with a file that is an "initscript".
All the actions that are needed to register the initscript with the
system are stored in one place - NOT in every package.
__

On Removing: Sound concepts here.  Our backups are stored in the
rollback files, fwiw.
__

On Hooks: We talked about having something like this, but we have not
found a need.  Tag handlers have been sufficient thus far.

__

On Fundamentals: It sounds like the "variant" in your design is the
"flavor" in Conary.  Every installable object in the Conary system is
identified by its name, version, and flavor.  The flavor says how the
object was built.  For example, since a package like lynx can be built
with or without support for SSL, you might have one "lynx" version
"2.8.5" flavor "with ssl" and one "lynx" version "2.8.5" flavor
"without ssl".  I recommend against having variant as optional, since
(name, version, variant) is essentially a primary key in the system.
If there is no relevant variant information, let a blank variant
explicitly express that.
__

On Architecture: I think that relying solely on the dependency
mechanism alone for architecture handling could be a mistake.  When
tools are trying to filter packages down to ones that are suited for a
particular target, you don't want to have to sift through dependencies
to determine the fitness.  In Conary we include the architecture
information as part of the flavor.  So, extending the example above,
you have "lynx" version "2.8.5" flavor "is: x86 ssl" (where "is"
stands for "instruction set").  We also have (now using a more
Conary-specific notation) "linux=2.8.5[is: x86_64 ssl]"  This is
critical to narrow the forest of available packages to ones that will
be suited for the target.

Optimizations are represented by instruction set "flags".  For
example, if you have a mplayer binary that utilizes sse2 instructions,
the flavor on the binary is "is: x86(sse2)".  This says that the
binary requires sse support to operate properly.  If mplayer can
_optionally_ use sse2 if it detects it, the flavor on the binary is
"is: x86(~sse2)".  This allows the score for a package that
dynamically supports sse2 to be higher on a sse2-capable system.

We also use dependencies to ensure that a package will run correctly
on the system.  All ELF binaries have an ABI in them.  We record the
ABI as a dependency for the file.  For example, on x86 the dependency
is "abi: ELF32(SysV x86)".  On x86_64 it's "abi: ELF64(SysV x86_64)".
Virtualization technology that provides the capability of running
"abi: ELF32(SysV x86)" binaries simply provides "abi: ELF32(SysV x86)".
__

On Dependencies: Please, no more dependency types.  It makes
calculating solutions for dependency closure extremely hard.  I think
that the focus should be on getting dependencies right.  Additional
information (Enhances, Suggests) should be part of metadata, so that
frontends with more complex solution algorithms can use them.

In Conary, provides and requires that are architecture specific are
explicitly so.  That is, if something provides "libc.so.6" on a 32-bit
system, the Provide is: "soname: ELF32/libc.so.6(GCC_3.0 GLIBC_2.0
GLIBC_2.1 GLIBC_2.1.1 GLIBC_2.1.2 GLIBC_2.1.3 GLIBC_2.2 GLIBC_2.2.1
GLIBC_2.2.2 GLIBC_2.2.3 GLIBC_2.2.4 GLIBC_2.2.6 GLIBC_2.3 GLIBC_2.3.2
GLIBC_2.3.3 GLIBC_2.3.4 GLIBC_PRIVATE SysV x86)" (note the support for
ABI versioning).  If something requires 32-bit libc.so.6, it may be
something like: "soname: ELF32/libc.so.6(GLIBC_2.0 GLIBC_2.1 GLIBC_2.2
GLIBC_2.3 SysV x86)", which says that we need the ELF32, SysV x86 ABI,
libc.so.6 with ABI versions GLIBC_2.0 GLIBC_2.1 GLIBC_2.2 GLIBC_2.3.

If a dependency is not architecture specific, a 32-bit package that
provides a dependency can solve the requirement in a 64-bit package.
__

On Features: We call this "capabilities".  If you require a
thread-safe version of the sqlite library, you might have the
threadsafe version of sqlite provide "sqlite(threadsafe)".  The
package that requires it would do the same.
__

On Configuration File Merging: Conary does this by saving the pristine
file in the Conary database.  Changesets that are changing config
files contain a diff to apply.  A three way merge is used to preserve
changes.  In fact, all the aspects of a file are preserved when doing
an update.  For example, if a security paranoid sysadmin wanted to
turn of suid root in /bin/ping, all (s)he needs to do is "chmod u-S
/bin/ping".  This change is also considered a local modification that
is merged in with changes contained in a changeset.
__

On Multi-Arch: You note that your solution requires packages to be
modified so that packages do not contain common files.  We've solved
this problem through policy that runs at package creation time.  One
policy we have breaks packages down into "components".  For example,
the glibc package is made up of the "glibc:runtime", "glibc:lib",
"glibc:devel", "glibc:devellib", "glibc:doc", and "glibc:locale"
components.  These components are created automatically by the policy.
When you're running on a 64-bit system and you want to be able to run
a 32-bit program, you only need glibc:lib.  Policy makes sure that
non-architecture specific files that are in paths like /usr/lib (which
would conflict if you had them in both a 32-bit and a 64-bit version
of glibc:lib).  When building on a target that does not use /lib,
/usr/lib, etc, policy automatically moves files installed in the wrong
path to the right one (from /lib to /lib64, for example).

Installing x86 binaries on a system like PPC64 (which already has
32-bit PPC libraries in /lib and 32-bit PPC binaries in /usr/bin) is
something that Conary does not handle automatically (yet).
__

Overall, I'm very impressed with what you've come up with.  However,
I'd like to see a few more things tackled.  One problem that plagued
RPM was dealing with distribution upgrades.  As we removed one package
in favor for another (xinetd replacing inetd, for example), we relied
on Obsoletes to do the right thing.  But sometimes there wasn't
anything that was replacing a particular package.  It just was no
longer needed in the repository.  This is one (of many) problems that
factored into including group support in Conary.  Updating from one
major release of a Conary-managed distribution to another is a simple
matter of updating the group that defines what's in version 1 of the
distribution to the group that defines what's in version 2.  We've
even successfully migrated a running system from one Conary-managed
distribution (rPath Linux) to another (Foresight Linux) with only
minor issues (things I think we've since fixed).

Second, I'm very interested in utilizing the package manager to help
address Debian derivatives.  If all the packages (both binaries and
sources) in Debian were managed in a repository system, then Ubuntu
could very simply add any additional patches they want on a
distributed branch of Debian in their own repository.  Re-basing
Ubuntu on a new version of Debian would be a branch merge operation in
the Ubuntu repository.  This is working extremely well in Conary.  We
maintain rPath Linux in our conary.rpath.com repository.  Foresight
Linux (which concentrates on bleeding edge GNOME and desktop
technology integration) automatically inherits all the work we do on
packages that they don't modify.  GNOME packages are on a branch in
their own repository.

There's an opportunity to radically change the way that distributions
are put together, much in the same way that BitKeeper and GIT
revolutionized kernel development.  As soon as people were able to
create a remote distributed repository for doing kernel work and
easily merge the changes from one repository to another, kernel
development accelerated enormously.  It's time to apply the same
methods to the entire distribution so that people can work on building
complete, integrated systems and share the common work between
projects.

If anyone has questions about the technical details of Conary, please
feel free to email me.  I'm looking forward to what comes out of the
dpkg 2.0 discussion.

Cheers,

Matt
-- 
Matt Wilson
Founding Engineer
rPath, Inc.
msw@rpath.com
Reply to:
Follow-Ups:
- Re: multiarch support and dpkg 2.0 design document
  - From: Scott James Remnant <scott@canonical.com>
References:
- Re: multiarch support and dpkg 2.0 design document
  - From: Frank Lichtenheld <djpig@debian.org>
- Re: Re: multiarch support and dpkg 2.0 design document
  - From: Scott James Remnant <scott@canonical.com>
Prev by Date: Re: Working on dpkg during Debconf
Next by Date: Re: Re: multiarch support and dpkg 2.0 design document
Previous by thread: Re: Re: multiarch support and dpkg 2.0 design document
Next by thread: Re: multiarch support and dpkg 2.0 design document
Index(es):
- Date
- Thread