After endlessly complaining about the poor performance and limited flexibility of the current debbugs, I've started to work on improving it. Specifically, I'm doing a ground-up rewrite of the entire system with the following goals: - Improved performance - Flexible reporting - Useful scripting API - Proper MIME handling In order to meet these goals, I'm taking the following approach: - Relational (SQL) backend - C++ core - SWIG wrappers Over the past week or so, I've built a prototype of the core and backend, and a Perl script to import bugs in the current debbugs database. What it can do so far: - Successfully import my test set of about 6000 bugs, including most bug history data. - Process all of the mailserver commands (reassign, severity, merge, unmerge, etc.) - Perform basic reporting (the equivalent of pkgreport.cgi and bugreport.cgi) via simple command line tools Initial results: The import takes about 17 minutes on my modest system (PII/350), but once everything is in the database, it screams (<1 second queries and updates). It will, of course, lose some performance with a database containing all available bugs, but it should scale much better than the current setup, and I haven't done any database optimization. It also eliminates the duplicate message copies that the current debbugs seems to store, so I disk space requirements appear to be greatly reduced as well. For my test bugs, the text files are 298M and the database is 45M). If my sample bugs are an accurate representation of the rest of them, this would save a few gigs of space on master. The relational database should make it much easier and faster to do complex reporting, and to add new types of data. For example, the often-requested "listen in" or "register interest" feature would be very easy to implement, as would keyword searching (though I don't know how well PostgreSQL would handle such a massive text query). What is missing: - SWIG interface definitions, to build the perl/python/whatnot API. These will have to be substantially different from the C++ headers, since SWIG isn't even close to being able to parse them yet. - Code to work with RFC822 messages (parse, munge, create) HELP: Is there a best-of-breed C/C++ library for this yet? I'd like to avoid reinventing this particular wheel. I think libmimelib from KDE might do this, and the next item - I'll also need a MIME library, but that will mostly be used by the query tools. HELP: does anyone have first-hand experience with libgmime or libmimelib? - Maintainer lookup. It would be great if this could be integrated with the database used for da-katie and friends, since I believe this data is already there. - CGIs. These will be a piece of cake, and could even be written in C++. - Bugscan, etc. These will probably want to use scripting languages, so those interfaces will have to exist first. Feedback? -- - mdz
Attachment:
pgpbkcbSM3ymp.pgp
Description: PGP signature