APT2, APT2, where are you?
================================
This e-mail shall give an overview on my recent thoughts and activities
on APT2. Yes, I know, it's progressing very slowly.
Repositories
------------
While working on APT2, I looked at three repository formats, each for a
different package manager: Debian, RPM, Slackware. From what I have
seen, I can conclude that there are many similarities in the data.
First of all, each of this repositories contains some kind of a
meta-index, which is optionally signed. For Debian, this is Release (or
the new InRelease files); for RPM it is repomd.xml; and for Slackware
one can use CHECKSUMS.md5.
Secondly, all repositories provide some kind of package indexes. For
Debian, these are the Packages files; for RPM it is primary.xml; and
for Slackware it is Packages.txt. In the common dists/ and pool/
structure, Debian package indexes are split by architecture and into
multiple components. The package manager can thus restrict the needed
files to the current architectures and the requested components. RPM
package indexes are shared by all architectures, and there is no
concept of components. For slackware repositories, we can assume that
there is only one architecture.
The package indexes provide us with a list of packages, their version,
dependencies, provided features and a human readable description. For
Slackware, dependency support is completely optional and not even
supported by the distribution itself (by design). RPM package indexes
also provide files lists for certain files in the packages (e.g.
configuration files).
Thirdly, there are source indexes. For Debian, they are like package
files, but just not architecture-specific. SRPM distributions are
exactly like RPM, we just have to deal with the source packages at
extraction time. Slackware is harder, there are CHECKSUMS.md5 in the
sources directory, and we may be able to calculate the location of a
source using it and the PACKAGES.txt file.
Fourthly, there a file lists. Debian has per-architecture Contents-*.gz
files, RPM has filelists.xml and Slackware has MANIFEST files in the
subdirectories.
One last word on Slackware: It might make sense to use this
components/section approach of Debian and apply it to Slackware as
well, treating each directory (extras, slackware) as one section.
All in all we have for elements:
Metaindex
Package indexes
Source indexes
File indexes
Resolving dependencies, etc.
----------------------------
When resolving dependencies, we need 3 pieces of information:
(a) the list of installed packages; reported by the low-level
package manager.
(b) the available packages, as reported by the repositories.
(c) the request, i.e. which packages to install or remove.
From this information, we can create a changeset. A changeset includes
all actions which have to be done to satisfy the request. It can thus
be expressed using the same data structures as a request; i.e. a tuple
(package,version, type of comparison,action). There is no need to
modify any kind of cache like in APT; instead we simply carry out the
actions and reload the list of installed applications afterwards.
APT treats a package as a set of versions. This kind of handling has
problems when it comes to things like multi-arch and multiple versions
installed at the same time (e.g. in RPM). Thus we treat each version as
a single package; and can allow multiple versions or packages from
multiple architectures to be installed at the same time.
External dependency solvers will be supported. I had an e-mail
conversation with zack on this topic, and he asked me whether this
would be possible. I'm just waiting for his final proposal on this
topic. This should probably be coordinated with cupt as well, so we can
share external solvers. The exchange format could be CUDF[0].
Caching
-------
The mmap()'able binary cache in APT has become a problem with growing
repository sizes because it is practically not possible to resize it on
the fly. APT gained support for using mremap(), but this does not work
in practice. You also have to work around all the pointers converting
them to locations relative to the beginning of the file when storing
and to the position in memory when using them. That's why I don't plan
to use such a cache format.
Instead, the cache format should be a subset of the Debian package
information files, which only includes basic information needed to
resolve dependencies. This solution is still multiple times faster than
using no such cache at all.
Handling file acquisition
--------------------------
File acquisition could be handled by multiple worker threads, whereas
workers are written as shared libraries and loaded using GModule. The
question here is how we shall handle graphical platforms (asynchronous
methods?) and how to know which module supports which protocol. For the
latter problem, there are two ways:
(a) identify the protocol using the filename, e.g. libhttp.so
(b) identify the protocol using information contained in the
library, and have optional priority, e.g.:
struct protocol {
string protocol;
int priority;
}
Handling platform-specific stuff
--------------------------------
We can use modules to provide implementations of abstract classes which
provide common functionalities like updating repository data. But we
might also want to allow developers to write platform-specific
programs, how should this be done if there is no access to those
classes?
Alternatively, we could link in the platform-specific parts and only
allow one platform in one installation. This has the advantage of
providing the specific parts, but the disadvantage that you can not use
APT2 on distribution X to create a chroot of distribution Y which uses
a different package manager.
Another possible option would be to export the specific parts into
libraries named e.g. libapt-debian and libapt-rpm and install them to
/usr/lib.
Target platforms for APT2
-------------------------
The primary target platform of APT2 is Debian and distributions derived
from it. Once this platform works, support for others may be added.
Communicating messages and errors
---------------------------------
The libapt library will use the logging facilities provided by GLib to
output information. Errors will be handled by GError where useful,
otherwise they will be send to the log as level CRITICAL and the
function returns false/null.
Applications have to setup the display of the logging domain "apt",
e.g. by printing it on the screen. It's their task to format the
messages in an appropriate manner. There may be a library "libapt-gtk"
providing widgets for graphical applications.
An exception from this rule could be added for the acquire subsystem,
which could add the error message into a field of the item.
Bindings to other languages
---------------------------
The recommended language for development of APT2 applications is Vala.
I will also support C (of course, it's done automatically by valac) and
Python. Other languages may be supported using GObject-introspection,
e.g. JavaScript.
Licensing
---------
APT2 could be licensed under the terms of the GNU Lesser General Public
License, version 2.1 or (at your option) any later version. This
license is widely used in the GNOME world for libraries and since we
are using a lot of technologies coming from there, this seems to be a
good choice.
Another option is the Apache license 2.0, but it's incompatibility with
version 2 of the GPL is not very helpful.
Progress and ToDo
-----------------
APT2 is moving very slowly, and has at the moment
WORKING
- Parser for /etc/apt.conf and other configuration files
- Parser for 822 tag files (although it will be rewritten)
- Single-threaded file acquisition using GIO and libsoup, no
support for authentification
PROGRESS
- Parser for /etc/apt/sources.list and similar files
- Repository handling, e.g. apt-get update can be done using my
local branch.
TODO
- Multi-threaded file acquisition using modules.
- Progress reporting for file acquisition.
- Support for PDiffs.
- Taking care of integration with the GLib mainloop.
- PackageCache, SourceCache, FileCache
- Dependency solvers.
- Final license decision
GOALS for 0.0.1:
- COMMAND: apt-get install
- COMMAND: apt-get source --download-only
- COMMAND: apt-get update
- External dependency solvers, possibly using the CUDF format. This
was a feature request by zack.
LONG TERM GOALS:
- Replace APT and aptitude on the command-line, and APT's libraries
- Replace the D-Bus server provided by aptdaemon
- Replace synaptic/gnome-app-install/software-center, or port
software-center over from apt.
- Get a native built-in SAT resolver.
Links
-----
[0] http://upsilon.cc/~zack/research/publications/mooml-iwoce-2009.pdf
Attachment:
signature.asc
Description: Digital signature