[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)



Le Thu, Jan 21, 2010 at 04:07:19PM +0100, Andreas Tille a écrit :
> On Thu, Jan 21, 2010 at 11:54:31PM +0900, Charles Plessy wrote:
> > 
> > I will try to provide drafts for
> > the loading in UDD. But I never programmed in Python, so I do not expect it
> > will work out of the box. Hopefully, it will save you some typing.
> 
> That's a good way to push me for helping you instead of waiting until I
> find time to do it from scratch.  Just ask in case of trouble.

Hi Andreas and everybody,

today I took a couple of hours to study the UDD and python (and snakes and
Greek mythology, thanks to the Wikipedia syndrome). I attached to this email a
draft for a bibliographic reference gatherer, “bibref_gatherer.py”.

Although in my previous emails I described a tab-delimited export format from
the upstream-medadata.d.n system, I realised that this is not robust in case
one field unfortunately contains a tab. Instead of re-inventing the wheel with
quoting mechanisms, I simply switched the exchange format to YAML.

http://upstream-metadata.debian.net/for_UDD/biblio.yaml

The above files contains triples to be loaded in a table of the UDD. They
provide the information needed to feed the Blends web sentinel with
bibliographic information.

Since I do not run a local copy of the UDD, I did not test the attached
gatherer. Please treat it as a stub. It is meant to be used with the following
patch to the UDD configuration file.

Index: config-org.yaml
===================================================================
--- config-org.yaml	(révision 1680)
+++ config-org.yaml	(copie de travail)
@@ -19,6 +19,7 @@
     ddtp: module udd.ddtp_gatherer
     ftpnew: module udd.ftpnew_gatherer
     screenshots: module udd.screenshot_gatherer
+    bibref: module udd.bibref_gatherer
     dehs: module udd.dehs_gatherer
     ldap: module udd.ldap_gatherer
     wannabuild: module udd.wannabuild_gatherer
@@ -528,6 +529,14 @@
   table:  screenshots
   screenshots_json: /org/udd.debian.org/mirrors/screenshots/screenshots.json
 
+bibref:
+  type: bibref
+  update-command: /org/udd.debian.org/udd/scripts/fetch_bibref.sh
+  path: /org/udd.debian.org/mirrors/bibref
+  cache: /org/udd.debian.org/mirrors/cache
+  table: bibref
+  bibref_yaml: /org/udd.debian.org/mirrors/bibref/bibref.yaml
+
 wannabuild:
   type: wannabuild
   wbdb: "dbname=wanna-build host=localhost port=5433 user=guest"


Please tell me what you think about it, and if you would like me to commit the
whole to the UDD sources.

Have a nice week-end,

-- 
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan
#!/usr/bin/env python

"""
This script imports bibliographic references from upstream-metadata.debian.net.
"""

from gatherer import gatherer
from sys import stderr, exit

online=0

def get_gatherer(connection, config, source):
  return bibref_gatherer(connection, config, source)

class screenshot_gatherer(gatherer):
  """
  Bibliographic references from upstream-metadata.debian.net.
  """

  def __init__(self, connection, config, source):
    gatherer.__init__(self, connection, config, source)
    self.assert_my_config('table')
    my_config = self.my_config

    cur = self.cursor()
    query = "DELETE FROM %s" % my_config['table']
    cur.execute(query)
    query = """PREPARE bibref_insert (text, text, text) AS
                   INSERT INTO %s
                   (package, key, value)
                    VALUES ($1, $2, $3)""" % (my_config['table'])
    cur.execute(query)

    pkg = None

  def run(self):
    my_config = self.my_config
    #start harassing the DB, preparing the final inserts and making place
    #for the new data:
    cur = self.cursor()

    bibref_file = my_config['bibref_yaml']
    fp = open(bibref_file, 'r')
    result = fp.read()
    fp.close()

    for res in safe_load_all(result):
      package, key, value = res
      query = """EXECUTE bibref_insert
                        (%(package)s, %(key)s, %(value)s)"""
      try:
        cur.execute(query, res)
      except UnicodeEncodeError, err:
        print >>stderr, "Unable to inject data for package %s. %s" % (res['name'], err)
        print >>stderr,  "-->", res
    cur.execute("DEALLOCATE bibref_insert")
    cur.execute("ANALYZE %s" % my_config['table'])

if __name__ == '__main__':
  main()

# vim:set et tabstop=2:


Reply to: