[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: charsets in debian/control



On Tue, Dec 07, 2004 at 10:17:17AM -0500, Daniel Burrows wrote:
> On Tuesday 07 December 2004 12:44 am, Peter Samuelson wrote:
> > And if the app already deals with charset conversions but assumes
> > iso-8859-1 input, then it's trivial to fix it to assume utf-8 input.
> 
>   This is not true.
> 
>   iso-8859-1 is an 8-bit charset, while Unicode is a 32-bit [0] charset.  
> Storing and manipulating iso-8859-1 strings requires no changes to internal 
> datatypes (only conversions for input and output); storing and manipulating 
> Unicode means you have to switch to a completely different set of 
> string-handling functions for all internal operations.

No, you do not have to do this. You can keep working with "char", the
changes when switching to UTF-8 will mostly have to deal with the fact that
one Unicode character is represented by more than one char. This means that
you need to use a different strlen function, take care only to chop strings
of char at character boundaries, ensure that input strings are actually
valid UTF-8, etc.

Cheers,

  Richard

-- 
  __   _
  |_) /|  Richard Atterer     |  GnuPG key:
  | \/¯|  http://atterer.net  |  0x888354F7
  ¯ '` ¯



Reply to: