How about a tool to search in the BTS database?


With Lucas Nussbaum, we thought that the BTS needs a tool to search in the database.

So, I started to code something that may interest you, with a Xapian[0] database. To have an overview, you can see the attached manpage bunny.1, with less. If you would to see the project progress I've attached the client bunny, the server side bunny-server.pl and the indexer bunny-indexer.pl.

[0] http://xapian.org/features

To try it:

* Download a bts-spool-db version (with 01/ 02/ 03/ etc)
* Configure some paths in the header of the three scripts
* Launch ./bunny-index.pl to create the Xapian db version
* Install bunny-server.pl on a web server
* Use ./bunny as a client (less bunny.1)

Currently, all is implemented and the searcher works fine. It only remains to find a solution to update the xapian db when bts-spool-db change. A web front end will also be added quickly. And there is still some bugs to fix before to release it.

How about add it in the devscripts and host the server side & the web front end on a debian server?


Envoyé via un serveur personnel, hébergé dans mon appartement et relié au réseau par un FAI participatif. Je suis le seul à pouvoir consulter mon courrier ; et vous ?
* http://julien.vaubourg.com
* lemanchotvolant@jabber.fr
* +33(0)6 71 555 141

# bunny-indexer - Index a bts-spool-db mirror (Debian bugs) in a Xapian db
# Copyright (C) 2011 Julien Vaubourg <julien@vaubourg.com>
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.

use strict;
use Search::Xapian (':all');
use POSIX;
use WWW::Mechanize;
use JSON -support_by_pp;

########## CONFIGURE:BEGIN ##########

# Path of the bts-spool-db mirror directory (with 01/ 02/ 03/ etc)
my $textdb_path = "./simpledb";

# Path and name of the Xapian db to create
my $database_path = "./simpledb-xap";

########### CONFIGURE:END ###########

# Slots used for the sorting
my $SLOT_DATE = 0;
my $SLOT_WNPP = 2;

my ($database, $indexer);

eval {
	# Create or open the Xapian db
	$database = Search::Xapian::WritableDatabase->new(

	$indexer = Search::Xapian::TermGenerator->new();
	my $stemmer = Search::Xapian::Stem->new("english");

	# Fill the Xapian db with .log and .summary files

}; if($@) {
	print STDERR "Exception: $@\n";
	exit 1;

# Index all bugs from bugs_dir/nn/ID.{summary,log}
sub index_bugs {
	my $bugs_dir = shift @_;
	opendir(my $BUGS_DIR, $bugs_dir)
		or die "Couldn't open $bugs_dir!";

	# Sub-directories 01/ 02/ 03/ etc
	while (my $ndir = readdir($BUGS_DIR)) {

		# It's not a wrong file
		if(-d "$bugs_dir/$ndir" && "$bugs_dir/$ndir" =~ /\d+/) {

			opendir(my $BUGS_NDIR, "$bugs_dir/$ndir")
				or die "Couldn't open $bugs_dir/$ndir!";

			# For each file .log .report .summary .status
			while (my $file = readdir($BUGS_NDIR)) {

				# We use the .summary and the .log files for
				# index the bug
				if($file =~ /(\d+)\.summary$/) {
					my (

					$bug_fixed = 0;
					$bug_log = "";
					$bug_id = $1;

					# Open the ID.summary
					open(my $SUMMARY, "$bugs_dir/$ndir/$file")
					or die "Couldn't open $bugs_dir/$ndir/$file!";

					# For each line, extract the field
					while(<$SUMMARY>) {

						if(/^Found-In:\s+(.*)/) {
							$bug_package_version = $1;

						} elsif(/^Done:\s+(.*)/) {
							$bug_doneby = $1;

						} elsif(/^Date:\s+(.*)/) {
							$bug_date = $1;

						} elsif(/^Fixed-In:\s+(.*)/) {
							$bug_fixed = 1;

						} elsif(/^Tags:\s+(.*)/) {
							if($1 eq "fixed") {
								$bug_fixed = 1;
							} else {
								$bug_tags = $1;

						} elsif(/^Severity:\s+(.*)/) {

							# Old severity fixed,
							# before the tag
							if($1 eq "fixed") {
								$bug_fixed = 1;
							} else {
								$bug_severity = $1;

						} elsif(/^Submitter:\s+(.*)/) {
							$bug_submitter = $1;

						} elsif(/^Subject:\s+(.*)/) {
							$bug_subject = $1;

							if($1 =~ /^(ITA|ITP|O|RFA|RFH|RFP):\s+/) {
								$bug_wnpp = $1;

						} elsif(/^Package:\s+(.*)/) {
							$bug_package = $1;

					close $SUMMARY;

					# Open the log version for index it
					open(my $LOG, "$bugs_dir/$ndir/$bug_id.log")
					or die "Couldn't open $bugs_dir/$ndir/$bug_id.log!";

					$bug_log .= "$_ " while(<$LOG>);

					close $LOG;

					# Index all informations


			closedir $BUGS_NDIR;

	closedir $BUGS_DIR;

# Index a bug in the Xapian db
sub index_bug {
	my (
	) = @_;

	eval {
		my $bug = Search::Xapian::Document->new();

		# Booleans filters
		$bug->add_term('I' . $bug_id);
		$bug->add_term('F' . $bug_fixed);
		$bug->add_term('S' . $bug_severity);
		$bug->add_term('W' . $bug_wnpp);

		my @tags = split(/\s+/, $bug_tags);
		$bug->add_term('T' . $_) for (@tags);

		# Free text fields
		$indexer->index_text($bug_package, 1, 'P');
		$indexer->index_text($bug_doneby, 1, 'D');
		$indexer->index_text($bug_subject, 1, 'S');

		my @versions = split(/\s+/, $bug_package_version);
		$indexer->index_text($_, 1, 'V') for (@versions);

		my @submitters = split(/\s*,\s*/, $bug_submitter);
		$indexer->index_text($_, 1, 'A') for (@submitters);

		# Date in human format
		my ($sec,$min,$hour,$mday,$mon,$year) = gmtime($bug_date);
		my $date = sprintf("%4d/%02d/%02d", $year + 1900, $mon + 1, $mday);

		# Data includes some informations, which will be extracted 
		$bug->set_data("[$bug_id] [$date] [$bug_severity] [$bug_fixed] $bug_subject");

		# Date in Xapian format
		$date = sprintf("%d%d%d", $year + 1900, $mon + 1, $mday);

		# Sort
		$bug->add_value($SLOT_DATE, $date);
		$bug->add_value($SLOT_SEVERITY, $bug_severity);
		$bug->add_value($SLOT_WNPP, $bug_wnpp);
		$bug->add_value($SLOT_PACKAGE, $bug_package);

		# Indexing

	}; if($@) {
		print STDERR "Exception: $@\n";
		exit 1;

# bunny-server - Backend and web frontend for searching in the Debian bugs
# database
# Copyright (C) 2011 Julien Vaubourg <julien@vaubourg.com>
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.

use strict;

use Search::Xapian (':all');
use JSON -support_by_pp;
use Data::Dumper::Simple;

########## CONFIGURE:BEGIN ##########

# Path of the Xapian db
my $database_path = "./simpledb-xap";

########### CONFIGURE:END ###########

my $flags = 0;
my $max = 10;
my $query_string = "";
my $sort_by = undef;
my $rsort = 0;

# Xapian options available
my %flags_values = (
	"boolean" => FLAG_BOOLEAN,
	"phrase" => FLAG_PHRASE,
	"lovehate" => FLAG_LOVEHATE,
	"anycase" => FLAG_BOOLEAN_ANY_CASE,
	"wildcard" => FLAG_WILDCARD,
	"not" => FLAG_PURE_NOT,
	"partial" => FLAG_PARTIAL

# Slots used for the sorting
my %slot_values = (
	"date" => 0,
	"severity" => 1,
	"wnpp" => 2,
	"package" => 3

print "Content-type: text/plain\r\n\r\n";

eval {
	# Separation of the GET params
	for (split /\&/, $ENV{QUERY_STRING}) {

		# URL decode
		s/\%([A-Fa-f0-9]{2})/pack('C', hex($1))/seg;

		my ($key, $value) = split /=/;

		# This params corresponds to a Xapian option
		if(defined $flags_values{$key}) {
			$flags |= $flags_values{$key};

		# Sort param
		} elsif($key eq "sort" && length($value) > 1 &&
			(defined $slot_values{$value} || defined $slot_values{substr($value, 1)})) {

			# Case of a reverse value (e.g. rdate)
			if($value =~ /^r(\w+)/) {
				$sort_by = $slot_values{$1};
				$rsort = 1;

			# Standard value
			} else {
				$sort_by = $slot_values{$value};

		# Max param
		} elsif($key eq "max" && $value =~ /\d+/) {
			$max = $value;

		# This is not an option is a query part
		} elsif($key eq "q") {
			$query_string = $value;

	# If no Xapian option
	$flags = FLAG_DEFAULT if !$flags;

	my $database = Search::Xapian::Database->new($database_path);
	my $enquire = Search::Xapian::Enquire->new($database);
	my $qp = Search::Xapian::QueryParser->new();


	# Boolean filters
	$qp->add_boolean_prefix("id", 'I');
	$qp->add_boolean_prefix("fixed", 'F');
	$qp->add_boolean_prefix("severity", 'S');
	$qp->add_boolean_prefix("wnpp", 'W');
	$qp->add_boolean_prefix("tag", 'T');

	# Free text fields
	$qp->add_prefix("author", 'A');
	$qp->add_prefix("doneby", 'D');
	$qp->add_prefix("package", 'P');
	$qp->add_prefix("subject", 'S');
	$qp->add_prefix("version", 'V');

	# Set the date slot
	my $vrpdate = new Search::Xapian::DateValueRangeProcessor(
		$slot_values{"date"}, 1, 1920);

	# Request on the db and sorting
	my $query = $qp->parse_query($query_string, $flags);
	$enquire->set_sort_by_value($sort_by, $rsort) if defined $sort_by;

	# Retrieve the $max first results
	my $mset = $enquire->get_mset(0, $max);
	my $msize = $mset->size();

	my %result;

	$result{"count"} = $mset->get_matches_estimated();
	$result{"about"} = $mset->get_matches_lower_bound() != $mset->get_matches_upper_bound();

	# For each bug found
	foreach my $m ($mset->items()) {
		my $data = $m->get_document()->get_data();

		# Data format: [ID] [DATE] [SEVERITY] [FIXED] Subject
		my ($id, $date, $severity, $fixed, $subject) =
			$data =~ /^\[(\w*)\] \[([\w\/]*)\] \[([\w-]*)\] \[(\w*)\] (.+)$/;

		# Inject the result in the final hash
		push(@{$result{"results"}}, { 
			"percent" => $m->get_percent(),
			"date" => $date,
			"id" => $id,
			"severity" => $severity,
			"fixed" => $fixed,
			"subject" => $subject

	# JSON encoding
	my $json = JSON->new->allow_nonref;
	my $json_string = $json->encode(\%result);

	print $json_string;
	exit 0;

}; if($@) {
	print STDERR "Exception: $@\n";
	exit 1;

# bunny - Search in the Debian bugs database
# Copyright (C) 2011 Julien Vaubourg <julien@vaubourg.com>
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.

use strict;

use WWW::Mechanize;
use JSON -support_by_pp;
use Getopt::Long;

########## CONFIGURE:BEGIN ##########

# URL of the bunny-server
my $bunny_server = "http://localhost/cgi-bin/bunny-server.pl";;

########### CONFIGURE:END ###########

# Encode strings to URL format
sub urlencode {
	my $txt = shift @_;
	$txt =~ s/([^A-Za-z0-9])/sprintf("%%%02X", ord($1))/seg;
	return $txt;

# Options available
my %opt = (
	"fixed" => undef,
	"sort" => undef,
	"wnpp" => [],
	"tag" => [],
	"severity" => [],
	"max" => 10,
	"id" => [],
	"report" => undef,
	"display" => undef,
	"boolean" => 1,
	"phrase" => 1,
	"lovehate" => 1,
	"anycase" => undef,
	"wildcard" => 1,
	"not" => undef,
	"partial" => undef

# Options format
	'fixed!' => \$opt{"fixed"},
	'use-sort=s' => \$opt{"sort"},
	'wnpp=s' => \@{$opt{"wnpp"}},
	'tag=s' => \@{$opt{"tag"}},
	'severity=s' => \@{$opt{"tag"}},
	'max=i' => \$opt{"max"},
	'id=i' => \@{$opt{"id"}},
	'report=i' => \$opt{"report"},
	'display=s' => \$opt{"display"},
	'x-boolean!' => \$opt{"boolean"},
	'x-quoted-phrase!' => \$opt{"phrase"},
	'x-lovehate!' => \$opt{"lovehate"},
	'x-any-case-boolean!' => \$opt{"anycase"},
	'x-wildcard!' => \$opt{"wildcard"},
	'x-not!' => \$opt{"not"},
	'x-partial!' => \$opt{"partial"}

my $browser = WWW::Mechanize->new();

# With --report option, the user just wants see a single bug report
if($opt{"report"}) {

	eval {
		my $url = "http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=".$opt{"report"};

		# Download the bug report
		my $content = $browser->content();

		# Remove all mail headers except From: To: Subject: and Date:
		$content =~ s/^(From|To|Subject|Date):/[\1]/gm;
		$content =~ s/^[^ :]+:[^\n]*\n(\h+[^\n]+\n)*//gsm;
		$content =~ s/^\n*From [^@]+@[^\n]+\n\[(From|To|Subject|Date)\]/"\n\n".('=' x 80)."\n\n[${1}]"/gsme;

		# Displays the report
		print "BUG #".$opt{"report"};
		print $content;
		print "".('=' x 80)."\n$url\n";

	}; if($@) {
		print STDERR "Bug ID not found.\n";
		exit 2;

# The user wants a research (no -r)
} else {

	my @query;

	# Form the query_string with options (&option=value)
	foreach (keys(%opt)) {
		if($_ !~ /id|fixed|wnpp|tag|severity|display/ && $opt{$_}) {
			push(@query, urlencode($_)."=".urlencode($opt{$_}));

	# Booleans filters are added on the Xapian syntax (option:value),
	# directly in the Xapian query, for each value.
	for (qw(id wnpp tag severity)) {
		if(@{$opt{$_}} > 0) {
			for my $value (@{$opt{$_}}) {
				push(@ARGV, $_.":".$value)

	# Fixed is the only boolean filter who can have just a single value
	push(@ARGV, "fixed:".$opt{"fixed"}) if defined $opt{"fixed"};

	# Complete url, with options and Xapian query
	my $url = sprintf(
		join('&', @query),
		urlencode(join(' ', @ARGV))

	eval {
		# Send the request
		my $content = $browser->content();

		# Retrieve the json response
		my $json = new JSON;
		my %result = %{

		# If any bugs were found
		if($result{"count"}) {

			# Displays the counter header only if -d contains 'c'
			# This is by default
			if($opt{"display"} =~ /(c|^$)/) {
				print "About " if $result{"about"};
				print $result{"count"}." matching bugs found.\n";
				print "--\n" if length($opt{"display"}) != 1;

			# If the user doesn't want only the counter
			if($opt{"display"} ne 'c') {
				my $date_e = $opt{"display"} =~ /e/;

				$opt{"display"} =~ s/[ce]//g;

				# For each bug found
				for (@{$result{"results"}}) {

					# European date format (-d de)
					if($date_e) {
						%$_->{"date"} =~ s/(\d{4})\/(\d\d)\/(\d\d)/\3\/\2\/\1/;

					# Shows percents (-d p) - default
						length($opt{"display"}) != 1 ?
							"%3u\% - " : "%u",
					) if $opt{"display"} =~ /(p|^$)/;

					# Shows dates (-d d)
						length($opt{"display"}) != 1 ?
							"[%10s] " : "%s",
					) if $opt{"display"} =~ /d/;

					# Shows severities (-d s)
						length($opt{"display"}) != 1 ?
							"[%9s] " : "%s",
					) if $opt{"display"} =~ /s/;

					# Shows fixed flags (-d f)
						length($opt{"display"}) != 1 ?
							"[%5s] " : "%s",
						%$_->{"fixed"} ? "FIXED" : ""
					) if $opt{"display"} =~ /f/;

					# Shows IDs (-d i) - default
						length($opt{"display"}) != 1 ?
							"[#%-7u] " : "%u",
					) if $opt{"display"} =~ /(i|^$)/;

					# Shows topics (-d t) - default
					print %$_->{"subject"} if $opt{"display"} =~ /(t|^$)/;

					print "\n";

		# No bug found
		} else {
			exit 1;

	}; if($@) {
		print STDERR "Database access error!\n";
		exit 2;

exit 0;


=head1 NAME

bunny - Search in the Debian bugs database


bunny [I<OPTIONS>] [I<QUERY>]

=head1 QUERY

I<QUERY> use the Xapian syntax. See the B<EXAMPLES> section.

=head2 Free text fields

B<author:*> - Bug's author

B<doneby:*> - Who fixed the bug

B<package:*> - Package concerned

B<subject:*> - Topic

B<version:*> - Package's version

=head1 OPTIONS

All booleans filters and xapian options can be negated (e.g. B<--fixed> and

=head2 Booleans filters


=item -f, --fixed

Only fixed bugs.

Same as C<fixed:1>.

=item B<[>-id IDB<]...>

Only the bug #ID.

Same as C<id:*>.

=item B<[>-w|--wnpp I<ITA|ITP|O|RFA|RFH|RFP>B<]...>

Only specifics wnpp bugs.

Same as C<wnpp:*>.

=item B<[>-t|--tag I<TAG>B<]...>

Only I<TAG> related.

See B<http://www.debian.org/Bugs/Developer#tags>.

Same as C<tag:*>.

=item B<[>-s|--severity I<critical|grave|serious|important|normal|minor|wishlist>B<]...>

Only specific severity related.

Same as C<severity:*>.


=head2 Format


=item -r, --report I<ID>

View the bug report #I<ID>.

=item -u, --use-sort I<date|severity|wnpp|package>

Sort results.

To have reverse order: I<rdate|rseverity|rwnpp|rpackage>

=item -m, --max I<LIMIT>

Max results. Default 10.

=item -d, --display I<DISPLAY>

B<c> - Shows counter header

B<d> - Shows dates

B<e> - Shows date with DD/MM/YYYY format

B<f> - Shows the fixed tag

B<i> - Shows IDs

B<p> - Shows percents

B<s> - Shows severity tags

B<t> - Shows topics

Example: C<-d fistdepc>

By default if empty: B<pict>


=head2 Xapian options


=item -x-b, --x-boolean

Support C<AND>, C<OR>, etc and bracketed subexpressions.

=item -x-q, --x-quoted-phrase B<(default)>

Support quoted phrases.

=item -x-l, --x-lovehate B<(default)>

Support C<+> and C<->.

=item -x-a, --x-any-case-boolean

Support C<AND>, C<OR>, etc even if they aren't in ALLCAPS.

=item -x-w, --x-wildcard B<(default)>

Support right truncation (not on B<Booleans filters>).

=item -x-n, --x-not

Allow queries such as C<NOT debian>.

These require the use of a list of all bugs in the database which is
expensive, so this feature isn't enabled by default.

=item -x-p, --x-partial

Enable partial matching.

Partial matching causes the parser to treat the query as a "partially entered"
search. This will automatically treat the final word as a wildcarded match,
unless it is followed by whitespace, to produce more stable results from
interactive searches.

Currently B<--x-partial> doesn't do anything if the final word in the query has a
boolean filter prefix, or if it is in a phrase (either an explicitly quoted one,
or one implicitly generated by hyphens or other punctuation). It also doesn't do
anything if if the final word is part of a value range.



 bunny author:john severity:grave tag:sid version:3.5.13
 bunny -x-n --nofixed package:iceweasel AND NOT author:john
 bunny doneby:'john doe' 25/12/01..31/12/01
 bunny package:'iceweasel*' it\'s embarassing
 bunny -w ITP -w RFP -w O
 bunny -r 421337 | less


B<0> - Matching bugs found

B<1> - No matching bugs found

B<Others> - Error

=head1 AUTHOR

Julien Vaubourg E<lt>julien@vaubourg.comE<gt>


Copyright (C) 2011 Julien Vaubourg <julien@vaubourg.com>

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.

