[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: changelog in utf-8 conflicts with maintainer in control



On Fri, Aug 01, 2003 at 01:47:59PM +0200, Celso González wrote:
> On Fri, Aug 01, 2003 at 01:33:18PM +0200, Eduard Bloch wrote:
> > #include <hallo.h>
> > * Celso González [Fri, Aug 01 2003, 01:14:33PM]:
> > 
> > > So when the uploader checks the name of the Maintainer in the control 
> > > file (not utf-8) with the name in the changelog says that are different, 
> > 	^^^^^^^^^
> >
> > Show me where policy tells you not to use UTF-8 in control files but
> > some other non-ascii charset with some other encoding instead.
> 
> Well, I think is not clear enough
> A reflexion from the guy that suggest the change in policy
> 
> Extracted from C2.2
> "Now, we can't switch to using UTF-8 for package control fields 
> and the like until dpkg has better support, but one thing we can 
> start doing today is requesting that Debian changelogs are UTF-8 
> encoded"
>  
> Maintainer is a control field
> 
> The solution is that both files have the same encoding (both latin1 or 
> both utf-8) but i??m not sure that utf-8 is correct for debina/control

Right now, dpkg has no explicit support for anything other than ASCII in
debian/control: that is to say, it doesn't attempt to recode maintainer
names to the current locale when asked to display control information
with 'dpkg -s', etc. However, it doesn't support Latin-1 any better than
it supports UTF-8! If you think that this is a problem, then the answer
is not to use Latin-1, but to use only ASCII.

That said, the problems caused by using an encoding that dpkg doesn't
support are not serious; they just mean that some people will see your
name wrongly when they type 'dpkg -s', but hey, that would happen anyway
(particularly if you don't like the ASCII transliteration of your name).
With that knowledge, if you're going to pick a non-ASCII encoding, then
UTF-8 is almost certainly the way to go. It seems unlikely to me that we
would select anything other than UTF-8 as the 8-bit encoding for control
files.

As a side note, I'd love to get rid of Latin-1 in maintainer names; it's
currently difficult for the BTS to declare any character set in the web
pages it generates, since some of the maintainer names it prints are in
Latin-1 and some in UTF-8 and it can't easily tell which are which.
Switching to UTF-8 throughout would solve that problem.

So, to summarize, please either use plain ASCII (if you think that the
lack of recoding is a problem) or UTF-8 (if you don't mind); using other
legacy encodings is just storing up trouble.

Cheers,

-- 
Colin Watson                                  [cjwatson@flatline.org.uk]



Reply to: