How to load unicode-encoded data in the UDD? (was: Re: Gathering package upstream meta-data in the UDD).
- To: debian-qa@lists.debian.org
- Subject: How to load unicode-encoded data in the UDD? (was: Re: Gathering package upstream meta-data in the UDD).
- From: Charles Plessy <plessy@debian.org>
- Date: Wed, 18 Aug 2010 09:29:30 +0900
- Message-id: <20100818002930.GA5459@merveille.plessy.net>
- In-reply-to: <20100323131709.GB6048@xanadu.blop.info>
- References: <20100118005931.GB16674@kunpuu.plessy.org> <20100118110517.GB26360@an3as.eu> <20100118230819.GE26132@kunpuu.plessy.org> <20100119075804.GB15712@an3as.eu> <20100119135148.GA11328@kunpuu.plessy.org> <20100119142051.GA30267@an3as.eu> <20100121145431.GD3723@kunpuu.plessy.org> <20100121150719.GB6206@an3as.eu> <20100206130204.GA27756@kunpuu.plessy.org> <20100323131709.GB6048@xanadu.blop.info>
> On 06/02/10 at 22:02 +0900, Charles Plessy wrote:
> >
> > http://upstream-metadata.debian.net/for_UDD/biblio.yaml
> >
> > The above files contains triples to be loaded in a table of the UDD. They
> > provide the information needed to feed the Blends web sentinel with
> > bibliographic information.
Le Tue, Mar 23, 2010 at 02:17:09PM +0100, Lucas Nussbaum a écrit :
>
> It would be better if you could try to get a local copy of UDD set up.
> It's quite easy since we provide a DB dump already.
Dear all,
I finally overcame my fear of the Python, and things went actually much easier
than I thought. I have managed to load the above data in a simple table
on the local copy of the UDD that Andreas is also using.
I encounder a character set problem: some of the contents of the yaml file are
encoded in UTF-8, and the UDD is ASCII:
plessy@sd-13492:/org/udd.debian.org/udd$ ./update-and-run.sh bibref
Unable to inject data for package adun.app. 'ascii' codec can't encode character u'\xe1' in position 39: ordinal not in range(128)
--> ['adun.app', 'Reference-Author', u'Michael A. Johnston, Ignacio Fdez. Galv\xe1n and Jordi Vill\xe0-Freixa']
Unable to inject data for package rnahybrid. 'ascii' codec can't encode character u'\xd6' in position 41: ordinal not in range(128)
--> ['rnahybrid', 'Reference-Author', u'REHMSMEIER, MARC and STEFFEN, PETER and H\xd6CHSMANN, MATTHIAS and GIEGERICH, ROBERT']
Unable to inject data for package melting. 'ascii' codec can't encode character u'\xe8' in position 6: ordinal not in range(128)
--> ['melting', 'Reference-Author', u'Le Nov\xe8re, Nicolas']
Unable to inject data for package t-coffee. 'ascii' codec can't encode character u'\xe9' in position 1: ordinal not in range(128)
--> ['t-coffee', 'Reference-Author', u'C\xe9dric Notredame and Desmond G. Higgins and Jaap Heringa']
To solve the problem, I am trying to use the unicode function like in the
following thread: http://lists.debian.org/20090522140048.GA6571@an3as.eu
Unfortunatly, changes like below have no effect:
Index: udd/bibref_gatherer.py
===================================================================
--- udd/bibref_gatherer.py (révision 1777)
+++ udd/bibref_gatherer.py (copie de travail)
@@ -49,7 +49,7 @@
package, key, value = res
query = "EXECUTE bibref_insert (%s, %s, %s)"
try:
- cur.execute(query, (package, key, value))
+ cur.execute(query, (package, key, unicode(str(value), 'utf-8')))
except UnicodeEncodeError, err:
print >>stderr, "Unable to inject data for package %s. %s" % (package, err)
print >>stderr, "-->", res
Does somebody has some advice ?
Have a nice day,
--
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan
Reply to: