dpkg musings
Here is a dump of thoughts about dpkg as a basis for the open source
world conference in Malage recently.
Dpkg has been stagnating for a while now, which is a bad thing. There are a
few things that I would like to have done by the end of this year:
* logging support
* beter conffile handling
* forcing all metadata to be valid utf-8 and doing the proper conversions
when displaying it
* have a prototype for a new dpkg-dev based on dpatch2
* N-arch setup / architecture & OS revisions
* scripting support, possibly via swig
* unittest framework
* versioned provides (fink wants this badly)
A problem with dpkg development is that it attracts a lot of people who
insist on a specific feature or even a specific implementation of a feature
when it doesn't make sense. There is no good development community which
hases out ideas in a structured manner. A system like Python's PEPs would be
very nice to have. Since we have dpkg.org (along with a minimal website at
the moment) it would be nice to see if we can get that rolling. It would
also make it easier to put somebody in charge of a proposal (lets call
it a DEP) once the design has been agreed on. Maybe switching to arch
instead of CVS would make it easier to manage that as well. Perhaps we
should create a seperate list for just design discussions, right now
debian-dpkg is too flooded with questions and bugreports. And I don't
think everyone who wants to discuss dpkg development is interested in
seeing all bugreports as well. Along with that I would also like to
take dpkg a bit more out of Debian: right now in my opinion there is too
much focus on doing things specifically for Debian instead of more
generally.
Adam and I have also been thinking about the coding style used in dpkg. At
this moment it is what we sometimes call Ian-C: high code density (two-space
indents for example), little comments. It's still better than Ian-perl but
not ideal to work with. We'll definitely want to reindent everything,
but we are also thinking of a more drastic step and start adding C++
infrastructure. Since C and C++ mix very well that should be easy to do.
So expected coding style changes are:
* allow C++ in the dpkg codebase
* reindent code to use normal tab-indents and a few other changes (we
* already decided on specific indent-settings, I need to dig those up again)
* document all new code using doxygen or an alternative system (I heard
KDE also has a nice system)
On specific features:
* logging support
Since we're thinking of going C++ it makes sense to move everything to
log4cpp (the C++ version of log4j, which is also Python's standard
logging infrastructure now) and add loggers for actions taken
* beter conffile handling
I already have this pretty much all implemented based on the proposal
on www.dpkg.org. It needs a better 3-way diff function (it calls
diff3 in a bad manner right now) and needs interpreter bindings
so it can replace ucf, but the basis is pretty much there.
* forcing all metadata to be valid utf-8 and doing the proper conversions
when displaying it
should be simple enough. Requires some perl hacking since dpkg-dev is
currently written in perl and I have no real desire to figure out
how perl does codecs.
* have a prototype for a new dpkg-dev
dpkg-dev as it is now has several problems:
- lousy design
- written in Ian-perl
- does not support multiple patches or multiple sources
I started on a rewrite in python a while ago but learned enough python
since than to realize that it should be redone again. A large part of
this is writing basic bits: code to take a package apart, parse
metadata, etc. This can eventually be tied in with adding interpreter
support in the main dpkg codebase; until than we can use a python
implementation. I have validating parsers for this which work fine
except for the fact they are really slow. So we'll want to validate
data on input and not completely verify everything when parsing.
multiple-patch & source support is becoming essential, as witnessed by
the rise of tools like dbs and dpatch. I would like to have a simple
system that does as little as possible based on these criteria:
- no support for architecture-specific patches. Patches should be clean
and work for everything so they can easily be merged upstream as well.
- when (un)packing we have to support:
+ unpacking a source (tar.gz/tar.bz2/tar) in a specified location
(possibly in a subtree of another source)
+ apply patches, possible reversed, possibly with a specified number
of patch components stripped
+ unpacking has to be guaranteed safe, so never call scripts inside
a source when unpacking. This is essential if you want to audit
a source since you can't trust what is in it.
+ since we can now have an arbitrary number of files for a package
we can longer distribute it as seperate .orig.tar.gz, .diff.gz and
.dsc but we will have to put it in real package format. This
should be either files in a ar archive like the deb format, or
(my preference) a tar file.
dpatch2 already has a group of people working on it and provides
most of the features we are going to need so it makes sense to
base a new dpkg-dev around a subset of it. I talked with a few
dpatch2 people and they seem to be enthousiastic about doing this.
* N-arch setup / architecture & OS revisions
I would like to have bi-arch be a simple form of N-arch setup, I don't
feel it makes sense to limit us to two options. Ideally it would be nice
to move both architecture and OS/kernel ABI into normal package
dependencies. This allows you to run ia32 packages on different
architectures simply be installing a plex86 package that Provides:
arch-ia32 for example. The package system could pull in architecture
emulators and kernel emulators (like icbs2) on demand. This should
also work for biarch systems.
The problem is that we have two kinds of dependencies: arch-specific
and arch-independent. For example a dependency on bash is arch-independent
since we really don't care of our bash is running 32bit, 64bit or
on an emulated processor: as long as it can run scripts it'll suffice.
But for a library this is different: you want a library that uses
the same architecture and kernel ABI as your application.
My current thinking of how to implement this is to add some kind of
constraints to dependencies so you get something like
Depends: bash, libc (>= 2.3) [i386, linux]
which dpkg already partially supports (Adam has been working on that).
The things between [..] than change form being architectures to
selection criteria: 'only accept libc of it satifies these additional
constraints'.
A problem with N-arch support is paths. rpm apparently does some horribly
path rewriting where paths are changed during unpack depending on what
architecture you unpack on. I strongly feel that is something we should
not do since it means a package can no longer rely on where its contents
will be on disk. This seems a FHS/LSB/policy problem: for libraries
it is essential to prevent collisions so we will require something like
/lib/<arch>-<abi>/ . To some degree we already have this (OpenSSL
uses it for example). Having abi in there is useful since we are going
to want to use that on FreeBSD and similar systems which can run both
FreeBSD and Linux applications. I suspect this will be the hardest part
to implement since it both means lots of coordiation with other
distributions and lots of interesting moving around of things. And we'll
run into fun things as well (like 64 bit application accessing utmp which
is also written by a 32bit daemon writing in a different format).
* scripting support, possibly via swig
should be obvious.
* unittest framework
I strongly believe in unit testing, but I always get stuck trying to
create a good framework, probably for lack of experience in using
them. It would be good to have a group of people who can work on this
and make sure that all bugs we encounter (and have encountered) will
be covered. This is partially something we can share with the RPM
maintainers; we already agreed to share version comparison tests if
we write one (rpm is changing its version comparison routines to be
more like dpkg so this will make sense).
* versioned provides
Ben Collins actually implemented this ages ago, but we pulled it out
since we could not manage to make dselect handle it properly. We
should really reintroduce this and deal with dselect. It should be
trivial to add to apt as well.
--
Wichert Akkerman <wichert@wiggy.net> It is simple to make things.
http://www.wiggy.net/ It is hard to make things simple.
----- End forwarded message -----
--
Wichert Akkerman <wichert@wiggy.net> It is simple to make things.
http://www.wiggy.net/ It is hard to make things simple.
Reply to: