Re: charsets in debian/control
On Tue, Dec 07, 2004 at 10:17:17AM -0500, Daniel Burrows wrote:
> On Tuesday 07 December 2004 12:44 am, Peter Samuelson wrote:
> > And if the app already deals with charset conversions but assumes
> > iso-8859-1 input, then it's trivial to fix it to assume utf-8 input.
>
> This is not true.
>
> iso-8859-1 is an 8-bit charset, while Unicode is a 32-bit [0] charset.
> Storing and manipulating iso-8859-1 strings requires no changes to internal
> datatypes (only conversions for input and output); storing and manipulating
> Unicode means you have to switch to a completely different set of
> string-handling functions for all internal operations.
No, you do not have to do this. You can keep working with "char", the
changes when switching to UTF-8 will mostly have to deal with the fact that
one Unicode character is represented by more than one char. This means that
you need to use a different strlen function, take care only to chop strings
of char at character boundaries, ensure that input strings are actually
valid UTF-8, etc.
Cheers,
Richard
--
__ _
|_) /| Richard Atterer | GnuPG key:
| \/¯| http://atterer.net | 0x888354F7
¯ '` ¯
Reply to: