Re: Dpkg architectures

To: Juho Östman <jostman@myrealbox.com>
Cc: debian-hurd@lists.debian.org
Subject: Re: Dpkg architectures
From: Marcus Brinkmann <Marcus.Brinkmann@ruhr-uni-bochum.de>
Date: Sun, 13 Feb 2000 13:11:56 +0100
Message-id: <20000213131156.D274@ulysses.dhis.net>
In-reply-to: <3.0.6.32.20000213121725.0090b690@myrealbox.com>; from jostman@myrealbox.com on Sun, Feb 13, 2000 at 12:17:25PM +0200
References: <3.0.6.32.20000213121725.0090b690@myrealbox.com>

On Sun, Feb 13, 2000 at 12:17:25PM +0200, Juho Östman wrote:
> What do you think about that idea of changing how dpkg uses architectures? 

I think a lot about it, and am in contact with the people developing the
next generation package format.
 
> The architectures like i386, mips, alpha would be the top level 
> architectures. They would have sub-classes like linux-i386 and hurd-i386.
> A higher level architecture would be compatible with systems of its
> sub-classes.
> In addition there could be architectures like linux-all which means that the 
> package is common for all Linux systems. The specifier all would match any
> system whatever.

This is the easiest way, but does not express all fine distinctions possible
and needed for binary compatibility. It might be the way we treat it after
the package pool idea has been implemented (our ftp team is working on it,
last I heard).
 
> I don't know much about the real implementation of the architecture system
> so there
> could be inconsistencies here.
> How difficult it is to implement a system like this?

You would need to mess with dinstall and the package tools. Not too
difficult, but not too easy as well.

> Are there
> better solutions to classify packages of different systems?

Yes, see the attached file.

> What are the package pools I have read about on this mailing list?

They will probably lead to the solution you describe above :)

Thanks,
Marcus


-- 
`Rhubarb is no Egyptian god.' Debian http://www.debian.org Check Key server 
Marcus Brinkmann              GNU    http://www.gnu.org    for public PGP Key 
Marcus.Brinkmann@ruhr-uni-bochum.de,     marcus@gnu.org    PGP Key ID 36E7CD09
http://homepage.ruhr-uni-bochum.de/Marcus.Brinkmann/       brinkmd@debian.org

Architecture Handling in Distributed Software Collections
=========================================================

Copyright 1999, 2000 Marcus Brinkmann <brinkmd@debian.org>
 distribution of verbatim copies allowed without restriction. 
 If you want to modify this, contact me and I will put it under a better
 license. I am not unwilling, just lazy.

They way I think about architecture treatment stems from my experiences with
the Debian package tools and bootstrapping the Debian GNU/Hurd system.

The concept presented below has the following core ideas, which I think are
important enough to summarize them in a short list:

* Package installation is orthogonal to distribution creation.
* Package installation requires only dependency verification.
* Distribution creation requires dependency verification and scoring.


Package Installation
====================

A package manager (the part of the packaging system that installs packages
on a host machine) should only be concerned with fulfilling dependencies. It
should determine if a package is installable, and if it is, it should
perform the required action.

For this, (probably virtual) packages representing ABI capabilities really
perform best. You can provide virtual architecture packages for any ABI
feature, like which processor is required, the object format (a.out, elf),
if a /proc filesystem is provided and to which standard it conforms, if
linux syscalls are available, if mach interfaces are provided and so on.
IMO, dependencies should be as weak as possible. If a pentium optimized
package runs also on a 486, the dependency should be "486" (or even 386),
not "pentium".

Examples:
. "grub" only depends on "i386", regardless of the operating system
  running.
. Most linux binaries will depend on a certain cpu only (apart from any
  libraries), the elf object format, and ceertain libraries determined by
  the soname, and will not make use of syscalls or the proc filessystem
  directly. Those will run on the Hurd without recompilation.
. A perl script evaluating the linux proc fs (version 2.2) can depend
  on perl and the procfs virtual package of version 2.2.

That was the "depends on" side. There is also the "provides" side. A usual
pentium linux box will provide "i386, i486, i586", "linux-syscalls (= 2.2)",
"elf" and "proc-fs" for example. Note that you can have version numbers here,
too. Which packages are provided is a feature of the "distribution".

Distributions
=============

The distribution is really a concept that should not be built directly into
the package installer, because it requires some overview on the system. The
reason is that a distribution needs to be a consistent set of packages.

Here config.guess may come in. There is a "default" mapping from
canonicalized host type strings to virtual packages, for example:

i586-gnu-linux   -->   "i386", "i486", "i586", "linux"
i486-gnu         -->   "i386", "i486", "gnu"

Note that concepts like "linux-syscalls", "elf" and "procfs" would be
provided by the linux kernel package itself. (Or by emulator packages on the
Hurd, for example).

It is not hard to write a utility that can create distributions on the
fly if all available packages are collected in a pool. You just have to map
your host system type to a set of virtual packages provided, and start
adding packages which dependencies can be fulfilled. Of course, you will
get further possibilities through additional "provides:" in the packages you
added, so you can add more and more packages from your pool to your
distribution. Using some standard algorithms from graph theory, you should
be able to take into account conflicting packages etc, to get a maximum
distribution, e.g., a distribution which only consists of packages which are
installable using packages from this distribution, and is not missing any
package from the pool with this property.

Hints or Scoring
================

Hints can be complex and I haven't thought them completely through. But here is
a rather complete analysis of the different cases that you have to treat
differently when considering scoring:

Usually, we would recompile a package for another architetcture only if it
is not yet available for this architecture, eg, if it is not in this
architectures distribution. Furthermore, recompiling for another
architecture would often result in a binary package that is not useful on
any of the existing architectures, so there is no conflict.

Example:
 Package foo is available for all i386 arches, because it only
 depends on the virtual package "i386".

Because we want to support powerpc, we recompile it for this architectures.
The resulting binary depends on powerpc only. Currently, no platform can run
i386 and powerpcs at the same time. Therefore, all distributions we create
with the procedure above will only contain either of these packages.

If two binary versions of the same package happen to match the same
distribution, two cases are possible:

1) "Scoring": Both packages have different dependencies ("i386" vs.
   "powerpc") or do not carry hints.
2) "Hints": Both packages have the same dependencies ("i386" native and
   "i386" with pentium optimization [but without pentium specific
   instructions]). Then we need hints to decide.

I won't go into detail how b1 and b2 could be treated, only so much:

For b1), it would be sufficient to order the available virtual dependency
packages that should be considered in a priority list. Or more abstract:
You have a function "int score(list of dependencies)" which calculates a
value for each package, and the highest value wins. equal value would mean
both packages are equal and it doesn't matter which one is picked. The
function can be very simple and only catch the cases that are known to
occure.

If both packages have same dependencies, but are not equal, they have to
tell us more about them in a "hints" metadata. This could be the
config.guess of the compiled architecture for example, or some preferred
target architecture, or something alike. The scoring would then be a
function of the dependencies and the hints metadata. This can actually
be a very simple function (like hint "i586" is better than "i486" is
better than "i386" for the i586 distribution).

The _important_ thing is that scoring is part of the distribution creation,
not part of package installation. I should always be allowed to install a
package with a bad score, even if it is a worse score than the default package
it would replace. For example, I could fetch such a package from the package
pool to override a bug in the optimization or something alike. 


Complexity
==========

I think my concept is rather complete, at least it is not obvious to me which
weird combination of architecture dependency is not covered by it. But is it
actually too complex? Is it overkill?

I think not, and this has two reasons:

1. The main complexity is in the distribution creation, which is completely
hidden from the user. It will happen at the main repository of the software
distribution. Users will continue to get a i386 linux distribution, or a
powerpc distribution of the Hurd, in short: there will be distributions
created for the config.guess system types as we do now. What we gain back
is a VERY simple package installation tool which is not any longer concerned
about scoring or similar, but merely a dependency checker.

2. The most difficult part seems to be the scoring. But this is only
complicated if you want to implement a general catch-all solution.
I think the number of cases were scoring actually happens is very small,
and can be covered by some simple rules.

The rest is really only organization of the packages that are uploaded and
installed in the distribution.

Determining if a package needs to be recompiled
===============================================

Consider the following situation: System B emulates system A completely, but
has additional features, which software C can use.

If C is compiled for A, the resulting binary can run on B, but when C is
recompiled for B, it will make use of the special features provided by
system B.

In reality, most packages adopt themselve at run time or are conflicting, so
the case above should occur rather seldom. In this case, a feature list can
be added to the source package, which contains a hint for the builder
that if compiled for system B, the package will offer further features.

I have not worked out the details for this, mostly because this case occurs
very rarely and any agreement on the feature tag that offers the above
distinction will fit.

Reply to:

References:
- Dpkg architectures
  - From: Juho Östman <jostman@myrealbox.com>

Prev by Date: Re: hurd install
Next by Date: Re: hurd install
Previous by thread: Dpkg architectures
Next by thread: include/asm
Index(es):
- Date
- Thread