Hello, a working version of http://contributors.debian.org is now online, and I'm now trying to get data sources set up. The site is designed so that each team takes care of its own data mining and sends it to the server. I'd like to ask you to set up a data source sending maintainer and uploader data to the site. While developing the site I played with getting data out of dak. I'm attaching the current code; until Alioth is down, the whole repository can be found at http://people.debian.org/~enrico/dc.git.tar.xz With that code, this command line will query dak and post data to the site[1]: ./dc-tool --source=ftp.debian.org --mine=examples/dak.cfg --auth-token=… --post You can use that code or just roll your own: the format and the protocol really are rather simple. Protocol details are at: https://wiki.debian.org/DebianContributors but it's really just a simple piece of JSON to be posted as a file field in a form over HTTP. The general idea is that each data source provides data about one or more types of contributions. My guess is that dak knows at least about maintainers (who do packaging work and write their names in changelogs) and uploaders (who sign an upload, possibly sponsored, and upload it). It's really up to you what kinds of contributions you can mine, though. There is no need to go way back with dates if you don't have the data readily available: I'm more interested in who's a contributor now, and I'm about to implement a way to hide older dates for data sources that cannot currently reliably go arbitrarily back in time. I'd like to ask you to please set up some periodical mining and posting on your side. I'm happy to help as I can. Ciao, Enrico [1] The auth token can be found at https://contributors.debian.org/sources/update/ftp.debian.org/ after having logged in with a web password at http://nm.debian.org; the login link at contributors.debian.org is currently broken. -- GPG key: 4096R/E7AD5568 2009-05-08 Enrico Zini <enrico@enricozini.org>
# coding: utf8 # Debian Contributors data source data mining tools for dak # # Copyright (C) 2013 Enrico Zini <enrico@debian.org> # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Affero General Public License as # published by the Free Software Foundation, either version 3 of the # License, or (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU Affero General Public License for more details. # # You should have received a copy of the GNU Affero General Public License # along with this program. If not, see <http://www.gnu.org/licenses/>. from ..core import * import email.utils import psycopg2 import logging log = logging.getLogger(__name__) __all__ = ["DakUploaders", "DakMaintainers"] class Dak(object): def __init__(self, ctype, cfg): self.db = psycopg2.connect(cfg["db"]) self.ctype = ctype def query_uploaders(self): log.debug("Querying uploaders for %s...", self.ctype) cur = self.db.cursor() cur.execute(""" SELECT s.install_date, u.uid, u.name FROM source s JOIN fingerprint f ON s.sig_fpr = f.id JOIN uid u ON f.uid = u.id """) for dt, uid, name in cur: if name is not None: name = name.decode("utf8", errors="replace") yield Identifier("login", uid, name), dt.date() def query_maintainers(self): log.debug("Querying maintainers for %s...", self.ctype) cur = self.db.cursor() cur.execute(""" SELECT s.install_date, c.name FROM source s JOIN maintainer c ON s.changedby = c.id """) for dt, m_name in cur: realname, emailaddr = email.utils.parseaddr(m_name) realname = realname.decode("utf8", errors="replace") yield Identifier("email", emailaddr, realname), dt.date() def _query_to_submission(self, generator, submission): count_rows = 0 by_ident = {} for ident, date in generator: count_rows += 1 c = by_ident.get(ident, None) if c is None: by_ident[ident] = Contribution(self.ctype, date, date) else: c.extend_by_date(date) count_contribs = 0 for ident, contrib in by_ident.iteritems(): count_contribs += 1 submission.add_contribution(ident, contrib) log.debug("%d rows read into %d contributions", count_rows, count_contribs) class DakUploaders(Dak): """ Scan git directories using file attributes to detect contributions """ def scan(self, submission): self._query_to_submission(self.query_uploaders(), submission) class DakMaintainers(Dak): """ Scan git directories using file attributes to detect contributions """ def scan(self, submission): self._query_to_submission(self.query_maintainers(), submission)
Attachment:
signature.asc
Description: Digital signature