Re: [UDD] Encoding problems with unicode strings

To: debian-qa@lists.debian.org
Subject: Re: [UDD] Encoding problems with unicode strings
From: Andreas Tille <andreas@an3as.eu>
Date: Fri, 22 May 2009 19:42:46 +0200
Message-id: <[🔎] 20090522174246.GA10877@an3as.eu>
In-reply-to: <[🔎] 20090522163639.GA6104@chistera.yi.org>
References: <[🔎] 20090522140048.GA6571@an3as.eu> <[🔎] 20090522163639.GA6104@chistera.yi.org>

On Fri, May 22, 2009 at 06:36:39PM +0200, Adeodato Simó wrote:
> UDD just has the descriptions from Packages.gz, which supposedly are in
> UTF-8. If your destination (a file, terminal, whatever) should be
> receiving UTF-8, you can just pass them unmodified, eg.:
> 
>     for row in curs.fetchall():
>         print "%s: %s (%s)\n%s\n" % (pkg, row[0], row[2], row[1])
> 
> That works for me.

Yes, this actually works fine.
 
> If, for some reason, you need unicode() and not str() objects, then you
> should specify that the string is in UTF-8, otherwise it will default to
> ASCII:
> 
>     for row in curs.fetchall():
>         string = unicode(row[1], 'utf-8') 

Ahh, that seems to be the solution I wanted.  And yes, I need
unicode because I'm actually using

	from genshi import Markup
	string = Markup(row[1])

and I can confirm that

	string = Markup(unicode(row[1], 'utf-8'))

works.

> So, your test program is not of much help. If you're still stuck, you
> should probably say what are you really trying to do, with details. But
> I don't think it's going to be a problem in UDD.
>
> P.S.: If doing `unicode(row[1], 'utf-8')` raises an exception, that
> would be because a package contains non-UTF8 in a description. Your
> program should be robust against that, and you can do:
> ...

Yes, I'm doing this from past experiences when parsing Packages
files directly - but thanks for the hint anyway. 

Thank you very much for the help

     Andreas.


[1] http://wiki.debian.org/UltimateDebianDatabase 

-- 
http://fam-tille.de

Reply to:

References:
- [UDD] Encoding problems with unicode strings
  - From: Andreas Tille <andreas@an3as.eu>
- Re: [UDD] Encoding problems with unicode strings
  - From: Adeodato Simó <dato@net.com.org.es>

Prev by Date: Re: [UDD] Encoding problems with unicode strings
Next by Date: Re: powerprefs sponsoring
Previous by thread: Re: [UDD] Encoding problems with unicode strings
Next by thread: joining qa
Index(es):
- Date
- Thread