[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#208011: [PROPOSAL] UTF-8 encoding for debian/control



On Tue, Sep 02, 2003 at 06:09:09PM -0500, Manoj Srivastava wrote:
> -
> On Tue, 2 Sep 2003 23:25:57 +0200, Denis Barbier <barbier@linuxfr.org> said: 
> 
> > On Tue, Sep 02, 2003 at 06:18:39PM +0200, Josip Rodin wrote:
> >> On Tue, Sep 02, 2003 at 05:48:27AM +0200, Martin Godisch wrote:
> >> > > > Anyway I fail to see which problems arise with this proposal,
> >> > > > could someone enlighten me?
> >> > >
> >> > > It's too broad. Has anyone tested if the packaging system
> >> > > correctly processes double-byte information everywhere?
> >> >
> >> > I had no problems reading mbc descriptions with dpkg and apt so
> >> > far. Is there some special test I should do?
> >>
> >> Your proposal says "the control fields". Description is just one,
> >> what about all the others? (If it was your intent to only do this
> >> for descriptions, why doesn't the proposal say so?)
> 
> > My understanding of the proposal is that if a field use non-ASCII
> > characters, encoding should be UTF-8.  It does not say that all
> > fields can contain non-ASCII characters, which is why current
> > packaging tools does not need to be patched.
> 
> 	This is a copout.  If the field is not supposed to have non
>  ascii characters (since the tool chain can not yet handle them), then
>  policy should not be specifying the encoding of these illegal
>  characters. 

Wrong logic.
For instance changelog.Debian encoding is specified, but there are
areas where accented letters are invalid, as in package names.

> > Currently 165 binary packages contain non-ASCII characters in
> > Maintainer or Description fields; there are 56 binary packages with
> > non-ASCII characters in their description (which means that you are
> > responsible of 9% of this garbage ;)), and 26 maintainer names with
> > such letters (but only 21 unique maintainers).  This is an upper
> > limit, maybe some of these strings are already UTF-8 encoded.
> 
> 	No. If policy blesses non ascii charactrers, then there are
>  going to be a lot more packages that shall have UTF-8 characters in
>  them, and the tool chain would still not be ready to deal with them.

Err you told that a significant number of packages would be buggy, I
wanted to exhibit figures.

Denis



Reply to: