[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#208011: [PROPOSAL] UTF-8 encoding for debian/control



On Tue, Sep 02, 2003 at 06:18:39PM +0200, Josip Rodin wrote:
> On Tue, Sep 02, 2003 at 05:48:27AM +0200, Martin Godisch wrote:
> > > > Anyway I fail to see which problems arise with this proposal, could
> > > > someone enlighten me?
> > > 
> > > It's too broad. Has anyone tested if the packaging system correctly
> > > processes double-byte information everywhere?
> > 
> > I had no problems reading mbc descriptions with dpkg and apt so far. Is
> > there some special test I should do?
> 
> Your proposal says "the control fields". Description is just one, what about
> all the others? (If it was your intent to only do this for descriptions, why
> doesn't the proposal say so?)

My understanding of the proposal is that if a field use non-ASCII characters,
encoding should be UTF-8.  It does not say that all fields can contain
non-ASCII characters, which is why current packaging tools does not need to
be patched.

Currently 165 binary packages contain non-ASCII characters in Maintainer
or Description fields; there are 56 binary packages with non-ASCII characters
in their description (which means that you are responsible of 9% of this
garbage ;)), and 26 maintainer names with such letters (but only 21 unique
maintainers).
This is an upper limit, maybe some of these strings are already UTF-8 encoded.

Denis



Reply to: