Bug#658227: Search produces unhelpful results for "DFSG" and "Social Contract"
On Wed, Feb 01, 2012 at 03:51:06PM -0600, Raphael Geissert wrote:
> On Wednesday 01 February 2012 07:44:19 David Prévot wrote:
> > Le 01/02/2012 04:40, Josh Triplett a écrit :
> > > I recently needed to dig up the URL for the DFSG. I tried the search
> > > in the upper right corner of debian.org, which did not produce useful
> > > results. Searching for "dfsg" produced numerous security advisories for
> > > software with "dfsg" in the version number; the actual DFSG
> > > (http://www.debian.org/social_contract.1.0.en.html#guidelines) did not
> > > appear anywhere on the first page of results. Searching for "social
> > > contract" produced the social contract as the third result, after two
> > > General Resolutions which happened to include "social contract" in their
> > > text.
> That I don't really know what to do. I think I understand the basics of the
> weighting system, but I'm confused by the results of switching from
> probabilistic query (parameter P) to boolean filtering (parameter B.)
You want to use P for the user's search string (you're searching for
text, not applying filters which should only affect if a document is
found or not, and shouldn't affect the weight when it is).
> Probably by adding a <meta name="keywords" content="dfsg, social
> contract"> it could gain some more weight. Olly, do you have any
> comment or suggestion as to what could be done? AFAICS not even the
> document's <title> is recorded as Sterms.
The title isn't indexed as S terms in 1.2.x (it is in trunk, so will be
in 1.3.x), but that's not really the issue here.
The document's title is indexed as unprefixed terms with a "wdf inc" of
5 (which means it's like it was written out 5 times), but the social
contract page's title doesn't contain "dfsg" so that doesn't help at
all for that case. And for the "social contract" case, the second
GR hit has "social contract" in the title (and both are about amendments
to the social contract, so actually pretty relevant, though the SC
itself is obviously more so to most people doing that search).
As things stand, indexing link text is probably the only way an indexer
could easily discover for itself that the SC is a good result for
"dfsg", and would help the "social contract" case, but we don't
currently handle link text specially (it would probably be a nice
addition to support this, though it's a bit fiddly to do efficiently).
Adding keywords via a meta tag should help (omindex does read them
and index them as if they were more body text). They don't get a weight
boost, so you might need to write them a few times to get enough of a
We could also perhaps make use of synonyms so "dfsg" in a query would
have a synonym of "debian free software guidelines" or something.