Re: Storage Mechanism question...
On Mon, Feb 02, 2004 at 04:10:44AM -0000, Ivan K wrote:
> I have a question regarding the storage mechanism used for debbugs..i
> understand everything is kept on flat files...my question is-how come
> http://bugs.debian.org has such good response times?
We don't have enough bugs for the O(n) factor to make a big difference.
We do need better indexing in the long run.
> i mean we were all taught that flat files are slower than an RDBMS as a
> rule..and also that plain CGI/Perl is slower,cannot fork many processes
> blah blah..
> How is it that searches are so fast?turbocharged hardware??
The hardware's certainly excellent, but even on the old not-as-good
hardware it was coping OK. It just only takes so long to search through
a single 40000-line file index. :) If you get to trying to search for
non-indexed things then response times would certainly go through the
floor (but we should never try to do that without indexing them first),
and you'll notice that searches for archived bugs are a bit slower due
to the 160000-odd-line index.
Even when we improve the indexing, the bulk of debbugs' storage will
still be the bug files themselves, and there's no performance problem
with leaving them as flat files. In addition, the flat-file design is
much easier to maintain than a true database and makes it easier to
mirror the spool to other systems.
> BTW,has anybody considered using XML for storage?
XML is not unreasonable for interchange (although RFC822-style key/value
pairs are easy to parse, too, and easier without libraries), but I don't
see why you would advocate it for storage. It's most certainly not worth
the effort of converting all the existing bugs, and I don't think I
would start out with it either.
> we don't need a database
What, are you suggesting one enormous XML file? :) I think not;
performance would be dreadful both reading and writing.
XML for storage has the problem that you have to parse big XML trees any
time you want to do anything, rather than the very simple and quick
reading through the file line-by-line as we do now.
Colin Watson [email@example.com]