APT2, APT2, where are you? ================================ This e-mail shall give an overview on my recent thoughts and activities on APT2. Yes, I know, it's progressing very slowly. Repositories ------------ While working on APT2, I looked at three repository formats, each for a different package manager: Debian, RPM, Slackware. From what I have seen, I can conclude that there are many similarities in the data. First of all, each of this repositories contains some kind of a meta-index, which is optionally signed. For Debian, this is Release (or the new InRelease files); for RPM it is repomd.xml; and for Slackware one can use CHECKSUMS.md5. Secondly, all repositories provide some kind of package indexes. For Debian, these are the Packages files; for RPM it is primary.xml; and for Slackware it is Packages.txt. In the common dists/ and pool/ structure, Debian package indexes are split by architecture and into multiple components. The package manager can thus restrict the needed files to the current architectures and the requested components. RPM package indexes are shared by all architectures, and there is no concept of components. For slackware repositories, we can assume that there is only one architecture. The package indexes provide us with a list of packages, their version, dependencies, provided features and a human readable description. For Slackware, dependency support is completely optional and not even supported by the distribution itself (by design). RPM package indexes also provide files lists for certain files in the packages (e.g. configuration files). Thirdly, there are source indexes. For Debian, they are like package files, but just not architecture-specific. SRPM distributions are exactly like RPM, we just have to deal with the source packages at extraction time. Slackware is harder, there are CHECKSUMS.md5 in the sources directory, and we may be able to calculate the location of a source using it and the PACKAGES.txt file. Fourthly, there a file lists. Debian has per-architecture Contents-*.gz files, RPM has filelists.xml and Slackware has MANIFEST files in the subdirectories. One last word on Slackware: It might make sense to use this components/section approach of Debian and apply it to Slackware as well, treating each directory (extras, slackware) as one section. All in all we have for elements: Metaindex Package indexes Source indexes File indexes Resolving dependencies, etc. ---------------------------- When resolving dependencies, we need 3 pieces of information: (a) the list of installed packages; reported by the low-level package manager. (b) the available packages, as reported by the repositories. (c) the request, i.e. which packages to install or remove. From this information, we can create a changeset. A changeset includes all actions which have to be done to satisfy the request. It can thus be expressed using the same data structures as a request; i.e. a tuple (package,version, type of comparison,action). There is no need to modify any kind of cache like in APT; instead we simply carry out the actions and reload the list of installed applications afterwards. APT treats a package as a set of versions. This kind of handling has problems when it comes to things like multi-arch and multiple versions installed at the same time (e.g. in RPM). Thus we treat each version as a single package; and can allow multiple versions or packages from multiple architectures to be installed at the same time. External dependency solvers will be supported. I had an e-mail conversation with zack on this topic, and he asked me whether this would be possible. I'm just waiting for his final proposal on this topic. This should probably be coordinated with cupt as well, so we can share external solvers. The exchange format could be CUDF[0]. Caching ------- The mmap()'able binary cache in APT has become a problem with growing repository sizes because it is practically not possible to resize it on the fly. APT gained support for using mremap(), but this does not work in practice. You also have to work around all the pointers converting them to locations relative to the beginning of the file when storing and to the position in memory when using them. That's why I don't plan to use such a cache format. Instead, the cache format should be a subset of the Debian package information files, which only includes basic information needed to resolve dependencies. This solution is still multiple times faster than using no such cache at all. Handling file acquisition -------------------------- File acquisition could be handled by multiple worker threads, whereas workers are written as shared libraries and loaded using GModule. The question here is how we shall handle graphical platforms (asynchronous methods?) and how to know which module supports which protocol. For the latter problem, there are two ways: (a) identify the protocol using the filename, e.g. libhttp.so (b) identify the protocol using information contained in the library, and have optional priority, e.g.: struct protocol { string protocol; int priority; } Handling platform-specific stuff -------------------------------- We can use modules to provide implementations of abstract classes which provide common functionalities like updating repository data. But we might also want to allow developers to write platform-specific programs, how should this be done if there is no access to those classes? Alternatively, we could link in the platform-specific parts and only allow one platform in one installation. This has the advantage of providing the specific parts, but the disadvantage that you can not use APT2 on distribution X to create a chroot of distribution Y which uses a different package manager. Another possible option would be to export the specific parts into libraries named e.g. libapt-debian and libapt-rpm and install them to /usr/lib. Target platforms for APT2 ------------------------- The primary target platform of APT2 is Debian and distributions derived from it. Once this platform works, support for others may be added. Communicating messages and errors --------------------------------- The libapt library will use the logging facilities provided by GLib to output information. Errors will be handled by GError where useful, otherwise they will be send to the log as level CRITICAL and the function returns false/null. Applications have to setup the display of the logging domain "apt", e.g. by printing it on the screen. It's their task to format the messages in an appropriate manner. There may be a library "libapt-gtk" providing widgets for graphical applications. An exception from this rule could be added for the acquire subsystem, which could add the error message into a field of the item. Bindings to other languages --------------------------- The recommended language for development of APT2 applications is Vala. I will also support C (of course, it's done automatically by valac) and Python. Other languages may be supported using GObject-introspection, e.g. JavaScript. Licensing --------- APT2 could be licensed under the terms of the GNU Lesser General Public License, version 2.1 or (at your option) any later version. This license is widely used in the GNOME world for libraries and since we are using a lot of technologies coming from there, this seems to be a good choice. Another option is the Apache license 2.0, but it's incompatibility with version 2 of the GPL is not very helpful. Progress and ToDo ----------------- APT2 is moving very slowly, and has at the moment WORKING - Parser for /etc/apt.conf and other configuration files - Parser for 822 tag files (although it will be rewritten) - Single-threaded file acquisition using GIO and libsoup, no support for authentification PROGRESS - Parser for /etc/apt/sources.list and similar files - Repository handling, e.g. apt-get update can be done using my local branch. TODO - Multi-threaded file acquisition using modules. - Progress reporting for file acquisition. - Support for PDiffs. - Taking care of integration with the GLib mainloop. - PackageCache, SourceCache, FileCache - Dependency solvers. - Final license decision GOALS for 0.0.1: - COMMAND: apt-get install - COMMAND: apt-get source --download-only - COMMAND: apt-get update - External dependency solvers, possibly using the CUDF format. This was a feature request by zack. LONG TERM GOALS: - Replace APT and aptitude on the command-line, and APT's libraries - Replace the D-Bus server provided by aptdaemon - Replace synaptic/gnome-app-install/software-center, or port software-center over from apt. - Get a native built-in SAT resolver. Links ----- [0] http://upsilon.cc/~zack/research/publications/mooml-iwoce-2009.pdf
Attachment:
signature.asc
Description: Digital signature