[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#638057: UDD: add gatherer for patch-tracker.d.o data



Package: qa.debian.org
Severity: wishlist
User: qa.debian.org@packages.debian.org
Usertags: udd

Hi.

I already discussed this feature with Lucas Nussbaum at Debconf11. It
would be nice to have a gatherer for patch-tracker.d.o data in UDD. I
already wrote a patch for patch-tracker.d.o to export the data we could
be interested into in a JSON file. The patch is still waiting for review
and I'll ping Sean in the next days.

I also wrote patches for UDD, and I'm attaching them here. They're still
incomplete, need to be adjusted for the actual setup of UDD (which I do
not know enough), still have some FIXME for potential SQL injection.
Moreover, SQL schemas have to be fixed and the whole infrastructure for
getting JSON data from patch-tracker.d.o must be set up.

But, in the meantime, people can have a look at the code to understand
what's going on.

I'm also attaching a draft of the SQL schema (which probably still lacks
things, most notably sane indexes). A gzipped test data file (based on a
snapshot of a few days ago) can be found here:

http://people.debian.org/~gio/patch_tracker.json.gz

Thanks, Gio.
-- 
Giovanni Mascellani <mascellani@poisson.phc.unipi.it>
Pisa, Italy

Web: http://poisson.phc.unipi.it/~mascellani
Jabber: g.mascellani@jabber.org / giovanni@elabor.homelinux.org
From ef97570af8a56e7018b971574152cdcbd29c1b8b Mon Sep 17 00:00:00 2001
From: Giovanni Mascellani <mascellani@poisson.phc.unipi.it>
Date: Mon, 1 Aug 2011 11:51:39 +0200
Subject: [PATCH] Added sketch of patch_tracker gatherer.

---
 udd/config-org.yaml               |    7 +++++
 udd/doc/sources/patch-tracker     |   24 +++++++++++++++++
 udd/udd/patch_tracker_gatherer.py |   52 +++++++++++++++++++++++++++++++++++++
 3 files changed, 83 insertions(+), 0 deletions(-)
 create mode 100644 udd/doc/sources/patch-tracker
 create mode 100644 udd/udd/patch_tracker_gatherer.py

diff --git a/udd/config-org.yaml b/udd/config-org.yaml
index 0a0745b..368425c 100644
--- a/udd/config-org.yaml
+++ b/udd/config-org.yaml
@@ -13,6 +13,7 @@ general:
     orphaned-packages: module udd.orphaned_packages_gatherer
     bugs: exec DEBBUGS_CONFIG_FILE=/org/bugs.debian.org/etc/config perl /org/udd.debian.org/udd/udd/bugs_gatherer.pl 
     carnivore: module udd.carnivore_gatherer
+    patch-tracker: module udd.patch_tracker_gatherer
     lintian: module udd.lintian_gatherer
     debtags: module udd.debtags_gatherer
     ubuntu-bugs: module udd.ubuntu_bugs_gatherer
@@ -546,6 +547,12 @@ carnivore:
   login-table: carnivore_login
   schema: carnivore
 
+patch-tracker:
+  type: patch-tracker
+  path: /home/giovanni/packages/qa/collab-qa/udd/mirrors/patch_tracker.json
+  patch-tracker-table: patch_tracker
+  patches-table: patches
+
 lintian:
   type: lintian
   update-command: rm -f /org/udd.debian.org/mirrors/lintian.log && wget -q http://lintian.debian.org/lintian.log -O /org/udd.debian.org/mirrors/lintian.log
diff --git a/udd/doc/sources/patch-tracker b/udd/doc/sources/patch-tracker
new file mode 100644
index 0000000..20027f9
--- /dev/null
+++ b/udd/doc/sources/patch-tracker
@@ -0,0 +1,24 @@
+DESCRIPTION
+  patch-tracker.d.o is a web service that shows the patches that get
+  applied to the Debian packages at build time. A summary of the shown
+  data is stored and published as JSON file, so it can be included
+  in UDD.
+
+  We import such information assigning a unique integer ID to each
+  pair (package, release). Two tables are then used to store the
+  data:
+   * patch-tracker-table: keeps general information about the
+     package (name, version, release) and about the patching method
+     it uses (the series type - i.e., whether it uses quilt, dpatch
+     or other systems, including the 'native' and 'no_series' values;
+     and the number of lines added and removed outside the debian/
+     directory by the .diff.gz or .debian.tar.gz patch);
+   * patches-table: keeps information about the specific patches
+     (the package to which belongs, its name and the number of lines
+     added and removed); it will probably extended in the future to
+     store also DEP 3 metadata.
+
+CONFIGURATION:
+  path: The path of the JSON statistics from patch-tracker.d.o.
+  patch-tracker-table, patches-table: The names of the database
+   tables used to describe the packages and patches.
diff --git a/udd/udd/patch_tracker_gatherer.py b/udd/udd/patch_tracker_gatherer.py
new file mode 100644
index 0000000..ed56727
--- /dev/null
+++ b/udd/udd/patch_tracker_gatherer.py
@@ -0,0 +1,52 @@
+#!/usr/bin/env python
+
+import sys
+from gatherer import gatherer
+import json
+
+def get_gatherer(connection, config, source):
+  return patch_tracker_gatherer(connection, config, source)
+
+class patch_tracker_gatherer(gatherer):
+
+  def __init__(self, connection, config, source):
+    gatherer.__init__(self, connection, config, source)
+    self.assert_my_config('path', 'patch-tracker-table', 'patches-table')
+
+  def run(self):
+    my_config = self.my_config
+
+    cur = self.cursor()
+
+    cur.execute("DELETE FROM %s" % my_config["patches-table"])
+    cur.execute("DELETE FROM %s" % my_config["patch-tracker-table"])
+
+    # FIXME SQL Injection exposures here
+    cur.execute("""PREPARE patch_tracker_insert 
+      AS INSERT INTO %s (id, package, version, release, series_type, nondebian_added, nondebian_removed) 
+      VALUES ($1, $2, $3, $4, $5, $6, $7)""" % (my_config['patch-tracker-table']))
+    cur.execute("""PREPARE patches_insert 
+      AS INSERT INTO %s (id, package_id, lines_added, lines_removed, name) 
+      VALUES ($1, $2, $3, $4, $5)""" % (my_config['patches-table']))
+
+    # TODO We don't need to keep all the decoded JSON file in memory; there are
+    # libraries to access it in a stream (DOM-like) fashion, but Python
+    # bindings don't appear to be available in Debian (libyajl)
+    patch_tracker_data = json.load(open(my_config['path']))
+    package_number = 0
+    patch_number = 0
+    for line in patch_tracker_data:
+        package, suite, version, series_type, nondebian, patches = \
+            (line['package'], line['suite'], line['version'],
+             line['series_type'], line['nondebian'], line['patches'])
+        cur.execute("EXECUTE patch_tracker_insert (%d, '%s', '%s', '%s', '%s', %d, %d);" %
+            (package_number, package, version, suite, series_type, nondebian[0], nondebian[1]))
+        for patch_name, patch_data in patches:
+            cur.execute("EXECUTE patches_insert (%d, %d, %d, %d, '%s');" %
+                (patch_number, package_number, patch_data[0], patch_data[1], patch_name))
+            patch_number += 1
+        package_number += 1
+
+    cur.execute("ANALYZE %s" % my_config["patch-tracker-table"])
+    cur.execute("ANALYZE %s" % my_config["patches-table"])
+
-- 
1.7.5.4

--
-- Name: patch_tracker; Type: TABLE; Schema: public; Owner: giovanni; Tablespace: 
--

CREATE TABLE patch_tracker (
    id integer NOT NULL,
    package text NOT NULL,
    version debversion NOT NULL,
    release text NOT NULL,
    series_type text NOT NULL,
    nondebian_added integer NOT NULL,
    nondebian_removed integer NOT NULL
);

--
-- Name: patch_tracker_pkey; Type: CONSTRAINT; Schema: public; Owner: giovanni; Tablespace: 
--

ALTER TABLE ONLY patch_tracker
    ADD CONSTRAINT patch_tracker_pkey PRIMARY KEY (id);

--
-- Name: package_version_idx; Type: INDEX; Schema: public; Owner: giovanni; Tablespace: 
--

CREATE INDEX package_version_idx ON patch_tracker USING btree (package, version);

--
-- Name: patches; Type: TABLE; Schema: public; Owner: giovanni; Tablespace: 
--

CREATE TABLE patches (
    id integer NOT NULL,
    package_id integer NOT NULL,
    lines_added integer NOT NULL,
    lines_removed integer NOT NULL,
    name text NOT NULL
);

--
-- Name: patches_pkey; Type: CONSTRAINT; Schema: public; Owner: giovanni; Tablespace: 
--

ALTER TABLE ONLY patches
    ADD CONSTRAINT patches_pkey PRIMARY KEY (id);

--
-- Name: package_id_idx; Type: INDEX; Schema: public; Owner: giovanni; Tablespace: 
--

CREATE INDEX package_id_idx ON patches USING btree (package_id);

-- Name: patches_package_id_fkey; Type: FK CONSTRAINT; Schema: public; Owner: giovanni
--

ALTER TABLE ONLY patches
    ADD CONSTRAINT patches_package_id_fkey FOREIGN KEY (package_id) REFERENCES patch_tracker(id);

Attachment: signature.asc
Description: OpenPGP digital signature


Reply to: