Re: charsets in debian/control

To: debian-devel@lists.debian.org
Subject: Re: charsets in debian/control
From: Richard Atterer <richard@list04.atterer.net>
Date: Tue, 7 Dec 2004 16:40:04 +0100
Message-id: <[🔎] 20041207154004.GA8011@fluff>
Mail-followup-to: debian-devel@lists.debian.org
In-reply-to: <[🔎] 200412071017.32119.dburrows@debian.org>
References: <[🔎] 20041205093921.GA29883@p12n.org> <[🔎] E1CbOzM-0006SB-00@chiark.greenend.org.uk> <[🔎] 20041207054445.GG29883@p12n.org> <[🔎] 200412071017.32119.dburrows@debian.org>

On Tue, Dec 07, 2004 at 10:17:17AM -0500, Daniel Burrows wrote:
> On Tuesday 07 December 2004 12:44 am, Peter Samuelson wrote:
> > And if the app already deals with charset conversions but assumes
> > iso-8859-1 input, then it's trivial to fix it to assume utf-8 input.
> 
>   This is not true.
> 
>   iso-8859-1 is an 8-bit charset, while Unicode is a 32-bit [0] charset.  
> Storing and manipulating iso-8859-1 strings requires no changes to internal 
> datatypes (only conversions for input and output); storing and manipulating 
> Unicode means you have to switch to a completely different set of 
> string-handling functions for all internal operations.

No, you do not have to do this. You can keep working with "char", the
changes when switching to UTF-8 will mostly have to deal with the fact that
one Unicode character is represented by more than one char. This means that
you need to use a different strlen function, take care only to chop strings
of char at character boundaries, ensure that input strings are actually
valid UTF-8, etc.

Cheers,

  Richard

-- 
  __   _
  |_) /|  Richard Atterer     |  GnuPG key:
  | \/¯|  http://atterer.net  |  0x888354F7
  ¯ '` ¯

Reply to:

Follow-Ups:
- Re: charsets in debian/control
  - From: Daniel Burrows <dburrows@debian.org>

References:
- charsets in debian/control
  - From: Peter Samuelson <peter@p12n.org>
- Re: charsets in debian/control
  - From: Matthew Garrett <mgarrett@chiark.greenend.org.uk>
- Re: charsets in debian/control
  - From: Peter Samuelson <peter@p12n.org>
- Re: charsets in debian/control
  - From: Daniel Burrows <dburrows@debian.org>

Prev by Date: Re: charsets in debian/control
Next by Date: Re: charsets in debian/control
Previous by thread: Re: charsets in debian/control
Next by thread: Re: charsets in debian/control
Index(es):
- Date
- Thread