[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

dpkg musings



Here is a dump of thoughts about dpkg as a basis for the open source
world conference in Malage recently. 

Dpkg has been stagnating for a while now, which is a bad thing. There are a
few things that I would like to have done by the end of this year:

* logging support
* beter conffile handling
* forcing all metadata to be valid utf-8 and doing the proper conversions
  when displaying it
* have a prototype for a new dpkg-dev based on dpatch2
* N-arch setup / architecture & OS revisions
* scripting support, possibly via swig
* unittest framework
* versioned provides (fink wants this badly)

A problem with dpkg development is that it attracts a lot of people who
insist on a specific feature or even a specific implementation of a feature
when it doesn't make sense. There is no good development community which
hases out ideas in a structured manner. A system like Python's PEPs would be
very nice to have. Since we have dpkg.org (along with a minimal website at
the moment) it would be nice to see if we can get that rolling. It would
also make it easier to put somebody in charge of a proposal (lets call
it a DEP) once the design has been agreed on. Maybe switching to arch
instead of CVS would make it easier to manage that as well.  Perhaps we
should create a seperate list for just design discussions, right now
debian-dpkg is too flooded with questions and bugreports. And I don't
think everyone who wants to discuss dpkg development is interested in
seeing all bugreports as well. Along with that I would also like to
take dpkg a bit more out of Debian: right now in my opinion there is too
much focus on doing things specifically for Debian instead of more
generally.

Adam and I have also been thinking about the coding style used in dpkg. At
this moment it is what we sometimes call Ian-C: high code density (two-space
indents for example), little comments. It's still better than Ian-perl but
not ideal to work with. We'll definitely want to reindent everything,
but we are also thinking of a more drastic step and start adding C++
infrastructure. Since C and C++ mix very well that should be easy to do.
So expected coding style changes are:

* allow C++ in the dpkg codebase
* reindent code to use normal tab-indents and a few other changes (we
* already decided on specific indent-settings, I need to dig those up again)
* document all new code using doxygen or an alternative system (I heard
  KDE also has a nice system)


On specific features:

* logging support

  Since we're thinking of going C++ it makes sense to move everything to
  log4cpp (the C++ version of log4j, which is also Python's standard
  logging infrastructure now) and add loggers for actions taken

* beter conffile handling

  I already have this pretty much all implemented based on the proposal
  on www.dpkg.org. It needs a better 3-way diff function (it calls
  diff3 in a bad manner right now) and needs interpreter bindings
  so it can replace ucf, but the basis is pretty much there.

* forcing all metadata to be valid utf-8 and doing the proper conversions
  when displaying it

  should be simple enough. Requires some perl hacking since dpkg-dev is
  currently written in perl and I have no real desire to figure out
  how perl does codecs.

* have a prototype for a new dpkg-dev

  dpkg-dev as it is now has several problems:
  - lousy design
  - written in Ian-perl
  - does not support multiple patches or multiple sources

  I started on a rewrite in python a while ago but learned enough python
  since than to realize that it should be redone again. A large part of
  this is writing basic bits: code to take a package apart, parse
  metadata, etc. This can eventually be tied in with adding interpreter
  support in the main dpkg codebase; until than we can use a python
  implementation. I have validating parsers for this which work fine
  except for the fact they are really slow. So we'll want to validate
  data on input and not completely verify everything when parsing.

  multiple-patch & source support is becoming essential, as witnessed by
  the rise of tools like dbs and dpatch. I would like to have a simple
  system that does as little as possible based on these criteria:

  - no support for architecture-specific patches. Patches should be clean
    and work for everything so they can easily be merged upstream as well.
  - when (un)packing we have to support:
    + unpacking a source (tar.gz/tar.bz2/tar) in a specified location
      (possibly in a subtree of another source)
    + apply patches, possible reversed, possibly with a specified number
      of patch components stripped
    + unpacking has to be guaranteed safe, so never call scripts inside
      a source when unpacking. This is essential if you want to audit
      a source since you can't trust what is in it.
    + since we can now have an arbitrary number of files for a package
      we can longer distribute it as seperate .orig.tar.gz, .diff.gz and
      .dsc but we will have to put it in real package format. This
      should be either files in a ar archive like the deb format, or
      (my preference) a tar file.

    dpatch2 already has a group of people working on it and provides
    most of the features we are going to need so it makes sense to
    base a new dpkg-dev around a subset of it. I talked with a few
    dpatch2 people and they seem to be enthousiastic about doing this.

* N-arch setup / architecture & OS revisions

  I would like to have bi-arch be a simple form of N-arch setup, I don't
  feel it makes sense to limit us to two options. Ideally it would be nice
  to move both architecture and OS/kernel ABI into normal package
  dependencies. This allows you to run ia32 packages on different
  architectures simply be installing a plex86 package that Provides:
  arch-ia32 for example. The package system could pull in architecture
  emulators and kernel emulators (like icbs2) on demand. This should
  also work for biarch systems.

  The problem is that we have two kinds of dependencies: arch-specific
  and arch-independent. For example a dependency on bash is arch-independent
  since we really don't care of our bash is running 32bit, 64bit or
  on an emulated processor: as long as it can run scripts it'll suffice.
  But for a library this is different: you want a library that uses
  the same architecture and kernel ABI as your application.

  My current thinking of how to implement this is to add some kind of
  constraints to dependencies so you get something like

      Depends: bash, libc (>= 2.3) [i386, linux]

  which dpkg already partially supports (Adam has been working on that).
  The things between [..] than change form being architectures to 
  selection criteria: 'only accept libc of it satifies these additional
  constraints'.

  A problem with N-arch support is paths. rpm apparently does some horribly
  path rewriting where paths are changed during unpack depending on what
  architecture you unpack on. I strongly feel that is something we should
  not do since it means a package can no longer rely on where its contents
  will be on disk. This seems a FHS/LSB/policy problem: for libraries
  it is essential to prevent collisions so we will require something like
  /lib/<arch>-<abi>/ . To some degree we already have this (OpenSSL
  uses it for example). Having abi in there is useful since we are going
  to want to use that on FreeBSD and similar systems which can run both
  FreeBSD and Linux applications. I suspect this will be the hardest part
  to implement since it both means lots of coordiation with other
  distributions and lots of interesting moving around of things. And we'll
  run into fun things as well (like 64 bit application accessing utmp which
  is also written by a 32bit daemon writing in a different format).

* scripting support, possibly via swig

  should be obvious.

* unittest framework

  I strongly believe in unit testing, but I always get stuck trying to
  create a good framework, probably for lack of experience in using
  them. It would be good to have a group of people who can work on this
  and make sure that all bugs we encounter (and have encountered) will
  be covered. This is partially something we can share with the RPM
  maintainers; we already agreed to share version comparison tests if
  we write one (rpm is changing its version comparison routines to be
  more like dpkg so this will make sense).

* versioned provides

  Ben Collins actually implemented this ages ago, but we pulled it out
  since we could not manage to make dselect handle it properly. We
  should really reintroduce this and deal with dselect. It should be
  trivial to add to apt as well. 

-- 
Wichert Akkerman <wichert@wiggy.net>    It is simple to make things.
http://www.wiggy.net/                   It is hard to make things simple.

----- End forwarded message -----

-- 
Wichert Akkerman <wichert@wiggy.net>    It is simple to make things.
http://www.wiggy.net/                   It is hard to make things simple.



Reply to: