[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Automatic Debug Packages Creation and Handling



OK, after our conversation on IRC here's a more detailed proposal. Sorry for not
giving much details before.

The project is about automatically producing debugging symbols for everything in
the archive, without the developers needing to add -dbg packages everywhere,
which are rarely used and right now we mirror everywhere.

This thread express the need for such a service very well:
http://lists.debian.org/debian-devel/2009/03/msg00228.html

Also, if we start using build ids (--build-id option to ld), we could serve the
symbols directly unpacked through a mount or something, so that a user can
automatically (virtually) have all the symbols, which would be then downloaded
as needed. The build ids work the following way: when an object is linked, ld
can put in a note in the binary header, a checksum that identifies that binary.
That can be used by tools (e.g. gdb already understands it) to look for
debugging symbols in a unique path for that binary. E.g. right now we put
symbols on /usr/lib/debug/$path. Using build ids, the symbols would be put in
/usr/lib/debug/.build-id/ab/cdef1234.debug, where abcdef1234 is the hash of the
binary, which gdb (or other tools) would look for after reading the Build ID
note in the binary. This way it's possible to ship debugging symbols for several
versions of the same binary/library, and the correct one would be picked up. A
very useful feature of this would be that you can mount our share, and
automatically get debugging symbols for everything. We could also integrate
tools like bug-buddy or drkonqi, that catch crashes and produce backtraces, to
mount that to get symbols for everything and produce useful backtraces.
Fedora[1] already uses --build-id by default (since 2007 or so). Not sure about
SuSE, but they have [2]. Martin Pitt told me in UDS he wanted to do something
about it too, but he hasn't had the time yet. I've asked the GCC maintainers to
pass --build-id by default[3].

[1] http://fedoraproject.org/wiki/Releases/FeatureBuildId
[2] http://en.opensuse.org/Build_ID
[3] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=535237

So back to automatically producing debug packages. The main idea here is that
the developers don't need to manually add them. Otherwise we could keep adding
-dbg everywhere, but that doesn't scale well. We have two options, build the
.ddebs only in the buildds, or build them everywhere. Pros and cons of each one:

* Build ddebs only on the buildds:
  - We need to divert tools to produce the ddebs
  - Builds aren't reproducible
  + Uploads would be smaller

* Build ddebs everywhere
  + Builds are reproducible
  + We modify the tools to build ddebs everywhere, less (no) hacks
  + No need to wait for source-only uploads (or for binaries to be thrown away)

I'm very much in favour of building ddebs everywhere.

There's also the question whether to build one .ddeb per source package or one
per binary package.

* One ddeb per binary package:
  + Easier to express dependencies: ddeb Depends: binary (= ${binary:Version})
  + Smaller download size if you only want symbols for one package and not all
    that are produced from the same source.
  - Too many ddeb packages?

* One ddeb per source package:
  - The opposite to the above :)

I currently have a working implementation for building one .ddeb per source
package for packages using debhelper (directly or indirectly, e.g. using CDBS
and debhelper.mk). It produces a -ddeb if the source package contains at least
one arch dependent package for the current architecture, and only if it doesn't
build a -dbg package. The very moment that you remove your -dbg package, a .ddeb
would start to be built. Also the .ddeb defaults could be overriden by listing
it in debian/control, adding .install files, etc, if needed (this wouldn't be
the typical case).

I've thought about packages being built 'manually', but I don't think that
should suppose a big problem, since the maintainer may want to build .ddebs
manually too. But it should be possible for those maintainers to automatize it
somehow, although that would be pretty specific to how the package is built.
Other suggestions, like patching/diverting strip, objcopy, or other lower-level
tools, are not good IMHO. So with going on a higher-level like debhelper we get
simplicity at the cost of not covering 100% of the archive. It's a reasonable
price to pay, unless we can find something else that everybody uses and that is
reasonable.

If we agree that the .ddebs should be built everywhere, no matter if we go with
one ddeb per source or per binary package, they would be uploaded together with
the other packages, including them in the .changes file (right now in my
implementation they don't appear on Binary and Description, only on Files and
Checksums-*. I think I'll need to patch dpkg-genchanges). Once the packages are
uploaded, here starts the archive part. Also if we go to build .ddebs
everywhere, everything should be transparent to the buildds, requiring no (or
few) changes to the builder infrastructure.

We need to decide what to do when the packages are uploaded. We have mainly a
couple of options:

1 Send the .ddebs to a different service, where we have the ddeb archive.
2 Add a new suite (or whatever is called? like main, contrib and non-free) where
  we put the .ddebs, and which isn't mirrored (by default). This sounds better
  to me because of easier maintainability, and because the source code would be
  distributed from the same host. But IANAL, so I'm not sure what our
  requirements are :)

If we go with 1, the share that serves unpacked debugging symbols could be
created directly from that service. However with 2 it couldn't (I guess), but a
new service could be created, that is synchronized with the main archive, and
that serves the symbols. In this case I'm not sure what problems we would face
with respect to legal requirements of providing the source code.
debug.debian.net doesn't provide source packages, fwiw (neither does
ddebs.ubuntu.com).

I don't have real numbers, but looking at ddebs.ubuntu.com, and if my
calculations are fine, they need 12GB for karmic/i386 main+universe (which would
be somewhat like Debian's main).[4] If that's correct, we would need something
like 12GB*${numberofarchs}*${numberofreleases} = 12*14?*3=504GB, except that
some arches are not complete, and many packages have the same version across
releases, so that would be an upper bound. I'll try to get a better approximation.

Sorry for not being very specific with respect to dak code. I haven't looked
into it yet, but I'll do that and start preparing patches once we know what
approach to take. I'll need reviews et al once I dive into it, but I guess that
won't be a big problem :)

I hope this is detailed enough to start a discussion!

Cheers,
Emilio

[4] $ wget http://ddebs.ubuntu.com/dists/karmic/main/binary-i386/Packages.gz &&
wget http://ddebs.ubuntu.com/dists/karmic/universe/binary-i386/Packages.gz &&
zcat Packages.gz* | grep ^Size | cut -f2 -d' ' > sizes; python
>>> sum = 0
>>> for l in open('sizes', 'r').readlines():
...     sum += int(l)
...
>>> sum
12230433192L
>>>

Attachment: signature.asc
Description: OpenPGP digital signature


Reply to: