[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: dpkg musings

On Mon, 1 Mar 2004, Wichert Akkerman wrote:

> Here is a dump of thoughts about dpkg as a basis for the open source
> world conference in Malage recently.
> Dpkg has been stagnating for a while now, which is a bad thing. There are a
> few things that I would like to have done by the end of this year:
> * logging support
> * beter conffile handling
> * forcing all metadata to be valid utf-8 and doing the proper conversions
>   when displaying it
> * have a prototype for a new dpkg-dev based on dpatch2

You mean dpkg-source, not dpkg-dev here.

And if dpatch2 is anything like dpatch, then yick.

> Along with that I would also like to
> take dpkg a bit more out of Debian: right now in my opinion there is too
> much focus on doing things specifically for Debian instead of more
> generally.

Yes, we've both agreed this for a long time.  debian-policy is for debian, but
not dpkg.  dpkg needs to support much more than policy allows.

> Adam and I have also been thinking about the coding style used in dpkg. At
> this moment it is what we sometimes call Ian-C: high code density (two-space
> indents for example), little comments. It's still better than Ian-perl but
> not ideal to work with. We'll definitely want to reindent everything,
> but we are also thinking of a more drastic step and start adding C++
> infrastructure. Since C and C++ mix very well that should be easy to do.
> So expected coding style changes are:
> * allow C++ in the dpkg codebase

This is not for the styling that c++ has, but for the actual features c++
gives us.  Exceptions, and objects, etc.

> * reindent code to use normal tab-indents and a few other changes (we
> * already decided on specific indent-settings, I need to dig those up again)

I've got these handy, in an indent.pro file.  However, indent fucks up block
comments, and comment blocks that exist to the right of code.

> * forcing all metadata to be valid utf-8 and doing the proper conversions
>   when displaying it
>   should be simple enough. Requires some perl hacking since dpkg-dev is
>   currently written in perl and I have no real desire to figure out
>   how perl does codecs.

perl 5.8 makes this simple.  Use streams, and attach filters.

> * have a prototype for a new dpkg-dev
>   dpkg-dev as it is now has several problems:
>   - lousy design
>   - written in Ian-perl
>   - does not support multiple patches or multiple sources

You mean dpkg-source, not dpkg-dev.

>   I started on a rewrite in python a while ago but learned enough python
>   since than to realize that it should be redone again. A large part of
>   this is writing basic bits: code to take a package apart, parse
>   metadata, etc. This can eventually be tied in with adding interpreter
>   support in the main dpkg codebase; until than we can use a python
>   implementation. I have validating parsers for this which work fine
>   except for the fact they are really slow. So we'll want to validate
>   data on input and not completely verify everything when parsing.

I really don't want to use python for dpkg-dev.  Makes porting to new arches
more difficult.

>   multiple-patch & source support is becoming essential, as witnessed by
>   the rise of tools like dbs and dpatch. I would like to have a simple
>   system that does as little as possible based on these criteria:
>   - no support for architecture-specific patches. Patches should be clean
>     and work for everything so they can easily be merged upstream as well.
>   - when (un)packing we have to support:
>     + unpacking a source (tar.gz/tar.bz2/tar) in a specified location
>       (possibly in a subtree of another source)


>     + apply patches, possible reversed, possibly with a specified number
>       of patch components stripped
>     + unpacking has to be guaranteed safe, so never call scripts inside
>       a source when unpacking. This is essential if you want to audit
>       a source since you can't trust what is in it.
>     + since we can now have an arbitrary number of files for a package
>       we can longer distribute it as seperate .orig.tar.gz, .diff.gz and
>       .dsc but we will have to put it in real package format. This
>       should be either files in a ar archive like the deb format, or
>       (my preference) a tar file.

renaming/moving/copying/touching dirs/files.

We need to support spaces and newlines in files as well.

patch-dependencies, which would be used for ordering.

I'm thinking .orig.{tar.gz,tar.bz,zip}, .dsc, diff.{ar,tar}.gz(undecided on
the format for this one).

> * N-arch setup / architecture & OS revisions
>   I would like to have bi-arch be a simple form of N-arch setup, I don't
>   feel it makes sense to limit us to two options. Ideally it would be nice
>   to move both architecture and OS/kernel ABI into normal package
>   dependencies. This allows you to run ia32 packages on different
>   architectures simply be installing a plex86 package that Provides:
>   arch-ia32 for example. The package system could pull in architecture
>   emulators and kernel emulators (like icbs2) on demand. This should
>   also work for biarch systems.
>   The problem is that we have two kinds of dependencies: arch-specific
>   and arch-independent. For example a dependency on bash is arch-independent
>   since we really don't care of our bash is running 32bit, 64bit or
>   on an emulated processor: as long as it can run scripts it'll suffice.
>   But for a library this is different: you want a library that uses
>   the same architecture and kernel ABI as your application.
>   My current thinking of how to implement this is to add some kind of
>   constraints to dependencies so you get something like
>       Depends: bash, libc (>= 2.3) [i386, linux]
>   which dpkg already partially supports (Adam has been working on that).
>   The things between [..] than change form being architectures to
>   selection criteria: 'only accept libc of it satifies these additional
>   constraints'.

Current dpkg-gencontrol will parse [] in debian/control on normal dependency
fields, and depending on the target arch being built for, remove atoms that
don't make sense.

To support this at runtime, requires more complex code in dpkg.  Jason
Gunthorpe seemed resistant to this.

>   A problem with N-arch support is paths. rpm apparently does some horribly
>   path rewriting where paths are changed during unpack depending on what
>   architecture you unpack on. I strongly feel that is something we should
>   not do since it means a package can no longer rely on where its contents
>   will be on disk. This seems a FHS/LSB/policy problem: for libraries
>   it is essential to prevent collisions so we will require something like
>   /lib/<arch>-<abi>/ . To some degree we already have this (OpenSSL
>   uses it for example). Having abi in there is useful since we are going
>   to want to use that on FreeBSD and similar systems which can run both
>   FreeBSD and Linux applications. I suspect this will be the hardest part
>   to implement since it both means lots of coordiation with other
>   distributions and lots of interesting moving around of things. And we'll
>   run into fun things as well (like 64 bit application accessing utmp which
>   is also written by a 32bit daemon writing in a different format).

Matt Taggart and I had a solution for that(except for the utmp idea, and we
didn't discuss -dev packages).

All arch-specific files or code would be in arch-specific dirs.  Ie,
/usr/${arch-id}/{lib,include}.  debs would then be made, such that they *only*
contained these files; they wouldn't even contain the normal
/usr/share/doc/$pkg.  Instead, they would depend on an -all package, that
provided that.

A sub-proposal of this would be to not included -all packages in
Packages-$arch.gz, but place them in Packages-all.gz.

> * scripting support, possibly via swig
>   should be obvious.


> * unittest framework
>   I strongly believe in unit testing, but I always get stuck trying to
>   create a good framework, probably for lack of experience in using
>   them. It would be good to have a group of people who can work on this
>   and make sure that all bugs we encounter (and have encountered) will
>   be covered. This is partially something we can share with the RPM
>   maintainers; we already agreed to share version comparison tests if
>   we write one (rpm is changing its version comparison routines to be
>   more like dpkg so this will make sense).

I've really liked junit.  But even knowing junit, when I do some complex
project, it's always hard to get the initial bits writtten.

> * versioned provides
>   Ben Collins actually implemented this ages ago, but we pulled it out
>   since we could not manage to make dselect handle it properly. We
>   should really reintroduce this and deal with dselect. It should be
>   trivial to add to apt as well.

Jason has said that apt could support it with a very few minor tweaks.

Also, all this code should be made available thru a shared library, to
facilitate code reuse, etc.  I have a lot of this done already.  It's got
control file parsing, generic hashtable(for file and pkg db, etc), wrappers
around all base kernel and libc calls, to convert error returns into
exceptions, block memory handling.  control file parsing and hashtable have
unit tests.

It's missing archive handling(cpio/ar/tar).  I was going to base this on
altar, but it's very much OO in nature, and the api it currently has is rather
plain.  I was working on a rewrite of it in java, which I was then going to
base a c++ api on.

Reply to: