Re: Debbugs: The Next Generation
On Wed, Aug 08, 2001 at 04:50:12PM +1000, Anthony Towns wrote:
> On Wed, Aug 08, 2001 at 01:31:48AM -0400, Matt Zimmerman wrote:
> > If you look at the little command-line programs in cmdline/ (see below), they
> > could (apart from the exception handling and ostream formatting magic) be
> > translated almost line-for-line into Perl or Python once the necessary wrappers
> > were generated. That is what I hope to do with SWIG.
> If you're going to write the exact same thing, the perl's the same;
> if you're going to write something subtly different (like, say, making
> pkgreport implement all the features of the pkgreport.cgi, such as by
> submitter and by maintainer reports) you have to start digging around
> in the C++ libraries.
pkgreport.cgi shouldn't be too different at all. You would have to look up the
class members (or analogue for whichever language you're working in) to sort
them and print them out separately, but little else.
The way to avoid digging around the libraries should be to write documentation.
> In particular, efficiently getting a list of bugs by some particular
> selection mechanism "all RC bugs", "all bugs related to source package <foo>"
> or whatever, pretty much requires you to hack around the C++ or SQL stuff.
This is true, and I don't see any good way around it. The current mechanism
suffers from this, too, in that you have to dig around the backend to answer
sufficiently complex queries. The SQL backend would be more complex to work
with, but also much more rewarding in flexibility and performance. I don't
think it's worth the implementation cost to try to completely isolate the
database, but I think a certain level of abstraction will still be useful.
> You're currently missing:
> * HTML generation
> * bug submission and pseudo-header parsing
> * "receive"
Pretty much. A good mail library will solve half of these, and the rest is
mostly building workalike tools that use the library API.
> You're also missing any way to do really quick hacks about working out
> bug stuff. The CGI scripts have a /var/lib/debbugs/index.db which I find
> pretty handy, eg.
At first glance, this looks pretty much like the same information that's in the
"bug" table in my schema, except that it includes the tags as well. It would
be quite simple to write a tool to generate exactly that file from the
database, and it would be fast enough to run once per minute if desired.
> Your "thanks" handling seems wrong too.
Yes, thanks will need some extra communication with the caller to signal that
no further processing should be done. I haven't really worried about
processing entire messages full of commands, just the individual operations
I've added a note to remind me to add that bit.
> > > Personally, I'd be inclined towards changing the .log format to be something
> > > akin to batched-SMTP and stored in the filesystem (so that it can be
> > > "replayed" if the database crashes; and so that the way the bugs logs are
> > > displayed can be changed in future).
> > Jason suggested something similar, due to a bad experience with postgres.
> > Personally, I think regular database dumps and backups would avoid most of
> > these worries.
> Doing regular database dumps seems likely to be pretty expensive. The debbugs
> archive is currently 2GB, while the active database is just under 1GB. By
> contrast, the twice-daily backups of the package pool database on ftp-master
> is 16MB of SQL. debbugs gets a lot more activity than the pool database too,
> so probably has a lot more chances to trip over any bugs in postgresql.
A dump of my test database is about 60MB and takes 30-40 seconds. Assuming
linear scale up to the size of the full debbugs database, a full dump would
probably be around 1GB. This is all text, and seems to compress by about 4:1
with gzip -9, so I don't think that the backups would be too unmanageable.
In theory, the package pool database should be reproducible from the contents
of the archive, yes? Do tools exist to do this should it become necessary?
> > Looking over them, all of the bug operations (except for
> > submission, which can be worked around) should be idempotent, so if the system
> > kept a log of commands executed, that log could be used as a replay script as
> > well.
> Erm. None of those operations are idempotent in that they change what's
> in the bug log.
True, but that's just harmless noise. The duplicate events could also be
filtered based on the history data (which would be a relatively fast database
> > What do we do if the current debbugs data gets corrupted (if we notice)?
> It doesn't get corrupted.
> Well, not by the underlying storage technology, anyway. Corruption due to
> bugs in debbugs itself is detected by verifying the db independently. Mostly,
> it just means we need to reprocess a messsage in
> /var/lib/debbugs/spool/incoming; otherwise we just expect people to refile
> the bug.
> But in the context of Jason's message, it just doesn't get corrupted.
I don't know what to say about the reliability issue. I was under the
impression that postgresql was more stable. Backups and redundancy, as needed.