OK, after our conversation on IRC here's a more detailed proposal. Sorry for not giving much details before. The project is about automatically producing debugging symbols for everything in the archive, without the developers needing to add -dbg packages everywhere, which are rarely used and right now we mirror everywhere. This thread express the need for such a service very well: http://lists.debian.org/debian-devel/2009/03/msg00228.html Also, if we start using build ids (--build-id option to ld), we could serve the symbols directly unpacked through a mount or something, so that a user can automatically (virtually) have all the symbols, which would be then downloaded as needed. The build ids work the following way: when an object is linked, ld can put in a note in the binary header, a checksum that identifies that binary. That can be used by tools (e.g. gdb already understands it) to look for debugging symbols in a unique path for that binary. E.g. right now we put symbols on /usr/lib/debug/$path. Using build ids, the symbols would be put in /usr/lib/debug/.build-id/ab/cdef1234.debug, where abcdef1234 is the hash of the binary, which gdb (or other tools) would look for after reading the Build ID note in the binary. This way it's possible to ship debugging symbols for several versions of the same binary/library, and the correct one would be picked up. A very useful feature of this would be that you can mount our share, and automatically get debugging symbols for everything. We could also integrate tools like bug-buddy or drkonqi, that catch crashes and produce backtraces, to mount that to get symbols for everything and produce useful backtraces. Fedora[1] already uses --build-id by default (since 2007 or so). Not sure about SuSE, but they have [2]. Martin Pitt told me in UDS he wanted to do something about it too, but he hasn't had the time yet. I've asked the GCC maintainers to pass --build-id by default[3]. [1] http://fedoraproject.org/wiki/Releases/FeatureBuildId [2] http://en.opensuse.org/Build_ID [3] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=535237 So back to automatically producing debug packages. The main idea here is that the developers don't need to manually add them. Otherwise we could keep adding -dbg everywhere, but that doesn't scale well. We have two options, build the .ddebs only in the buildds, or build them everywhere. Pros and cons of each one: * Build ddebs only on the buildds: - We need to divert tools to produce the ddebs - Builds aren't reproducible + Uploads would be smaller * Build ddebs everywhere + Builds are reproducible + We modify the tools to build ddebs everywhere, less (no) hacks + No need to wait for source-only uploads (or for binaries to be thrown away) I'm very much in favour of building ddebs everywhere. There's also the question whether to build one .ddeb per source package or one per binary package. * One ddeb per binary package: + Easier to express dependencies: ddeb Depends: binary (= ${binary:Version}) + Smaller download size if you only want symbols for one package and not all that are produced from the same source. - Too many ddeb packages? * One ddeb per source package: - The opposite to the above :) I currently have a working implementation for building one .ddeb per source package for packages using debhelper (directly or indirectly, e.g. using CDBS and debhelper.mk). It produces a -ddeb if the source package contains at least one arch dependent package for the current architecture, and only if it doesn't build a -dbg package. The very moment that you remove your -dbg package, a .ddeb would start to be built. Also the .ddeb defaults could be overriden by listing it in debian/control, adding .install files, etc, if needed (this wouldn't be the typical case). I've thought about packages being built 'manually', but I don't think that should suppose a big problem, since the maintainer may want to build .ddebs manually too. But it should be possible for those maintainers to automatize it somehow, although that would be pretty specific to how the package is built. Other suggestions, like patching/diverting strip, objcopy, or other lower-level tools, are not good IMHO. So with going on a higher-level like debhelper we get simplicity at the cost of not covering 100% of the archive. It's a reasonable price to pay, unless we can find something else that everybody uses and that is reasonable. If we agree that the .ddebs should be built everywhere, no matter if we go with one ddeb per source or per binary package, they would be uploaded together with the other packages, including them in the .changes file (right now in my implementation they don't appear on Binary and Description, only on Files and Checksums-*. I think I'll need to patch dpkg-genchanges). Once the packages are uploaded, here starts the archive part. Also if we go to build .ddebs everywhere, everything should be transparent to the buildds, requiring no (or few) changes to the builder infrastructure. We need to decide what to do when the packages are uploaded. We have mainly a couple of options: 1 Send the .ddebs to a different service, where we have the ddeb archive. 2 Add a new suite (or whatever is called? like main, contrib and non-free) where we put the .ddebs, and which isn't mirrored (by default). This sounds better to me because of easier maintainability, and because the source code would be distributed from the same host. But IANAL, so I'm not sure what our requirements are :) If we go with 1, the share that serves unpacked debugging symbols could be created directly from that service. However with 2 it couldn't (I guess), but a new service could be created, that is synchronized with the main archive, and that serves the symbols. In this case I'm not sure what problems we would face with respect to legal requirements of providing the source code. debug.debian.net doesn't provide source packages, fwiw (neither does ddebs.ubuntu.com). I don't have real numbers, but looking at ddebs.ubuntu.com, and if my calculations are fine, they need 12GB for karmic/i386 main+universe (which would be somewhat like Debian's main).[4] If that's correct, we would need something like 12GB*${numberofarchs}*${numberofreleases} = 12*14?*3=504GB, except that some arches are not complete, and many packages have the same version across releases, so that would be an upper bound. I'll try to get a better approximation. Sorry for not being very specific with respect to dak code. I haven't looked into it yet, but I'll do that and start preparing patches once we know what approach to take. I'll need reviews et al once I dive into it, but I guess that won't be a big problem :) I hope this is detailed enough to start a discussion! Cheers, Emilio [4] $ wget http://ddebs.ubuntu.com/dists/karmic/main/binary-i386/Packages.gz && wget http://ddebs.ubuntu.com/dists/karmic/universe/binary-i386/Packages.gz && zcat Packages.gz* | grep ^Size | cut -f2 -d' ' > sizes; python >>> sum = 0 >>> for l in open('sizes', 'r').readlines(): ... sum += int(l) ... >>> sum 12230433192L >>>
Attachment:
signature.asc
Description: OpenPGP digital signature