[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

APT2, APT2, where are you?



APT2, APT2, where are you?
================================

This e-mail shall give an overview on my recent thoughts and activities 
on APT2. Yes, I know, it's progressing very slowly.


Repositories
------------
While working on APT2, I looked at three repository formats, each for a 
different package manager: Debian, RPM, Slackware. From what I have 
seen, I can conclude that there are many similarities in the data.

First of all, each of this repositories contains some kind of a 
meta-index, which is optionally signed. For Debian, this is Release (or 
the new InRelease files); for RPM it is repomd.xml; and for Slackware 
one can use CHECKSUMS.md5.

Secondly, all repositories provide some kind of package indexes. For 
Debian, these are the Packages files; for RPM it is primary.xml; and 
for Slackware it is Packages.txt. In the common dists/ and pool/ 
structure, Debian package indexes are split by architecture and into 
multiple components. The package manager can thus restrict the needed 
files to the current architectures and the requested components. RPM 
package indexes are shared by all architectures, and there is no 
concept of components. For slackware repositories, we can assume that 
there is only one architecture.

The package indexes provide us with a list of packages, their version, 
dependencies, provided features and a human readable description. For 
Slackware, dependency support is completely optional and not even 
supported by the distribution itself (by design). RPM package indexes 
also provide files lists for certain files in the packages (e.g. 
configuration files).

Thirdly, there are source indexes. For Debian, they are like package 
files, but just not architecture-specific. SRPM distributions are 
exactly like RPM, we just have to deal with the source packages at 
extraction time. Slackware is harder, there are CHECKSUMS.md5 in the 
sources directory, and we may be able to calculate the location of a 
source using it and the PACKAGES.txt file.

Fourthly, there a file lists. Debian has per-architecture Contents-*.gz 
files, RPM has filelists.xml and Slackware has MANIFEST files in the 
subdirectories.

One last word on Slackware: It might make sense to use this 
components/section approach of Debian and apply it to Slackware as 
well, treating each directory (extras, slackware) as one section.

All in all we have for elements:
    Metaindex
    Package indexes
    Source indexes
    File indexes

Resolving dependencies, etc.
----------------------------

When resolving dependencies, we need 3 pieces of information:
    (a) the list of installed packages; reported by the low-level
        package manager.
    (b) the available packages, as reported by the repositories.
    (c) the request, i.e. which packages to install or remove.

From this information, we can create a changeset. A changeset includes 
all actions which have to be done to satisfy the request. It can thus 
be expressed using the same data structures as a request; i.e. a tuple 
(package,version, type of comparison,action). There is no need to 
modify any kind of cache like in APT; instead we simply carry out the 
actions and reload the list of installed applications afterwards.

APT treats a package as a set of versions. This kind of handling has 
problems when it comes to things like multi-arch and multiple versions 
installed at the same time (e.g. in RPM). Thus we treat each version as 
a single package; and can allow multiple versions or packages from 
multiple architectures to be installed at the same time.

External dependency solvers will be supported. I had an e-mail 
conversation with zack on this topic, and he asked me whether this 
would be possible. I'm just waiting for his final proposal on this 
topic. This should probably be coordinated with cupt as well, so we can 
share external solvers. The exchange format could be CUDF[0].

Caching
------- 
The mmap()'able binary cache in APT has become a problem with growing 
repository sizes because it is practically not possible to resize it on 
the fly. APT gained support for using mremap(), but this does not work 
in practice. You also have to work around all the pointers converting 
them to locations relative to the beginning of the file when storing 
and to the position in memory when using them. That's why I don't plan 
to use such a cache format.

Instead, the cache format should be a subset of the Debian package 
information files, which only includes basic information needed to 
resolve dependencies. This solution is still multiple times faster than 
using no such cache at all.

Handling file acquisition
--------------------------
File acquisition could be handled by multiple worker threads, whereas 
workers are written as shared libraries and loaded using GModule. The 
question here is how we shall handle graphical platforms (asynchronous 
methods?) and how to know which module supports which protocol. For the 
latter problem, there are two ways:

    (a) identify the protocol using the filename, e.g. libhttp.so
    (b) identify the protocol using information contained in the 
        library, and have optional priority, e.g.:

            struct protocol {
                string protocol;
                int priority;
            }


Handling platform-specific stuff
--------------------------------
We can use modules to provide implementations of abstract classes which 
provide common functionalities like updating repository data. But we 
might also want to allow developers to write platform-specific 
programs, how should this be done if there is no access to those 
classes?

Alternatively, we could link in the platform-specific parts and only 
allow one platform in one installation. This has the advantage of 
providing the specific parts, but the disadvantage that you can not use 
APT2 on distribution X to create a chroot of distribution Y which uses 
a different package manager.

Another possible option would be to export the specific parts into 
libraries named e.g. libapt-debian and libapt-rpm and install them to 
/usr/lib.

Target platforms for APT2
-------------------------
The primary target platform of APT2 is Debian and distributions derived 
from it. Once this platform works, support for others may be added.

Communicating messages and errors
---------------------------------
The libapt library will use the logging facilities provided by GLib to 
output information. Errors will be handled by GError where useful, 
otherwise they will be send to the log as level CRITICAL and the 
function returns false/null.

Applications have to setup the display of the logging domain "apt", 
e.g. by printing it on the screen. It's their task to format the 
messages in an appropriate manner. There may be a library "libapt-gtk" 
providing widgets for graphical applications.

An exception from this rule could be added for the acquire subsystem, 
which could add the error message into a field of the item.

Bindings to other languages
---------------------------
The recommended language for development of APT2 applications is Vala. 
I will also support C (of course, it's done automatically by valac) and 
Python. Other languages may be supported using GObject-introspection, 
e.g. JavaScript.

Licensing
---------
APT2 could be licensed under the terms of the GNU Lesser General Public 
License, version 2.1 or (at your option) any later version. This 
license is widely used in the GNOME world for libraries and since we 
are using a lot of technologies coming from there, this seems to be a 
good choice.

Another option is the Apache license 2.0, but it's incompatibility with 
version 2 of the GPL is not very helpful.

Progress and ToDo
-----------------

APT2 is moving very slowly, and has at the moment

WORKING
    - Parser for /etc/apt.conf and other configuration files
    - Parser for 822 tag files (although it will be rewritten)
    - Single-threaded file acquisition using GIO and libsoup, no
      support for authentification
PROGRESS
    - Parser for /etc/apt/sources.list and similar files
    - Repository handling, e.g. apt-get update can be done using my 
      local branch.
TODO
    - Multi-threaded file acquisition using modules.
    - Progress reporting for file acquisition.
    - Support for PDiffs.
    - Taking care of integration with the GLib mainloop.
    - PackageCache, SourceCache, FileCache
    - Dependency solvers.
    - Final license decision

GOALS for 0.0.1:
    - COMMAND: apt-get install
    - COMMAND: apt-get source --download-only
    - COMMAND: apt-get update
    - External dependency solvers, possibly using the CUDF format. This 
      was a feature request by zack.

LONG TERM GOALS:
    - Replace APT and aptitude on the command-line, and APT's libraries
    - Replace the D-Bus server provided by aptdaemon
    - Replace synaptic/gnome-app-install/software-center, or port
      software-center over from apt.
    - Get a native built-in SAT resolver.

Links
-----
[0] http://upsilon.cc/~zack/research/publications/mooml-iwoce-2009.pdf

Attachment: signature.asc
Description: Digital signature


Reply to: