[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#656288: python3-apt: difficulties with non-UTF-8-encoded TagFiles



On Wed, Jan 18, 2012 at 10:02:31AM +0000, Colin Watson wrote:
> On Wed, Jan 18, 2012 at 12:56:03AM +0000, Colin Watson wrote:
> > python-debian's test suite also tests that it's possible to parse old
> > Sources files in *mixed* encodings.  This is going to be harder because
> > it basically means having apt_pkg.TagSection return bytes, which I don't
> > think is desirable in general.  Maybe this could be optional somehow?
> 
> Thinking about it, this seems a reasonable thing to make switchable in
> TagFile's constructor.  After all:
> 
>   >>> with open("test", encoding="iso-8859-1") as test:
>   ...     print(test.read().__class__)
>   ...
>   <class 'str'>
>   >>> with open("test", mode="rb") as test:
>   ...     print(test.read().__class__)
>   ...
>   <class 'bytes'>
> 
> So there's clear precedent in the language for the same method returning
> str or bytes depending on how the class was constructed.  Maybe a bytes=
> keyword argument?

You'd also need to take care of TagSection if that is done, which should
then work in bytes mode when passed a bytes string.

Basically you'd need to modify TagSection and TagFile to both store whether
to use bytes or unicode and pass the value of that flag from the TagFile
to the TagSection. Then create a function

	PyObject *TagFile_ToString(char *s, size_t n)

or similar that uses PyString_* functions or PyBytes_ functions depending
on the context (where PyString is mapped to unicode in Python 3, and
str in Python 2). Then use that function everywhere we currently
create strings in the TagFile.


-- 
Julian Andres Klode  - Debian Developer, Ubuntu Member

See http://wiki.debian.org/JulianAndresKlode and http://jak-linux.org/.



Reply to: