[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: charsets in debian/control



Josselin Mouette <joss@debian.org> writes:

> Le dimanche 05 décembre 2004 à 11:43 +0100, Andreas Barth a écrit :
>> I think most of us agree that non-UTF-8-characters are not a good idea
>> (please note the UTF-8-characters is a superset of ASCII).  For some
>> places (like package names), I think most of us even agree that only
>> ASCII-characters should be used. Also, there is the proposal that in
>> other fields (i.e. names), an translation should (also) be used if the
>> characters are not in some basic classes (more or less: ASCII plus
>> ASCII-similar letters).
>> 
>> So, I personally consider non-UTF-8-characters an bug, and
>> UTF-8-not-ASCII on the way from bug to allowed.
>
> Many of us have names that can't be written using ASCII. Furthermore,
> the Debian tools need consistency between the developer name in the
> changelog and the Maintainer/Uploaders fields in the control file. The
> only way for these developers to have a policy-compliant changelog
> without having their uploads considered as NMUs is to encode the control
> file in UTF-8.
> -- 
>  .''`.           Josselin Mouette        /\./\
> : :' :           josselin.mouette@ens-lyon.org
> `. `'                        joss@debian.org
>   `-  Debian GNU/Linux -- The power of freedom

Which means all control file, changelog file, changes file, Packages
and Sources file parsing programs have to be truely converted to
UTF-8.

dpkg, apt, aptitude, dselect, apt-proxy, apt-cacher(?), debmirror,
debpartial-mirror, DAK, cdebootstrap, ... I guess most just work out
of luck with the mixture we have now.

We already had cdebootstrap crashes because of it (its parser was a
bit stricter than the rest).

On that note, how likely is it to hit a UTF-8 character encoding that
contains a '\n'? Any non UTF-8 aware parser would assume a new line
has started and get parse errors.

MfG
        Goswin



Reply to: