[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: analyzing popcon data for bogus recommends

On Tue, May 13, 2008 at 10:51:37PM -0400, Joey Hess wrote:

> It would be nice to have a list which Recommends are ignored/overridden
> the most when installing packages, to identify Recommends that need to be
> downgraded to Suggests. Could we derive such a list from popcon data? I
> think it would need to be done by analyzing each individual popcon data
> submission, so I can't do it as that data is not published.

Yes you can.  Also, there's a xapian database in my home directory
(~enrico/anapop/something IIRC) on people.debian.org that is built with
the popcon data, and you can query that database to quickly get a count
of "submissions having package X AND NOT package Y" and "package X AND
package Y".

That Xapian index indexes popcon submissions as if they were
"documents", and installed packages as if they were "terms".

The database is updated using a weekly cronjob that rescans the whole
popcon database.  I've quickly tried in the past[1] to come out with
ways to hook the indexing process into popcon so that I could do
realitime indexing of the data (it gives an up to date index and doesn't
suck 100% cpu on gluck once a week), but I got the impression that it
required having more discussion than I was motivated to have at the
time.  If more people are interested in using that xapian index, it can
make sense to rehash this.



[1] http://lists.alioth.debian.org/pipermail/popcon-developers/2007-June/001374.html
GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <enrico@debian.org>

Attachment: signature.asc
Description: Digital signature

Reply to: