Re: charsets in debian/control

To: debian-devel@lists.debian.org
Subject: Re: charsets in debian/control
From: Matthew Garrett <mgarrett@chiark.greenend.org.uk>
Date: Tue, 7 Dec 2004 15:23:48 +0000
Message-id: <[🔎] E1CbhBs-0002hS-00@chiark.greenend.org.uk>
In-reply-to: <[🔎] 200412071017.32119.dburrows@debian.org>
References: <[🔎] 20041205093921.GA29883@p12n.org> <[🔎] E1CbOzM-0006SB-00@chiark.greenend.org.uk> <[🔎] 20041207054445.GG29883@p12n.org> <[🔎] 20041207054445.GG29883@p12n.org> <[🔎] 200412071017.32119.dburrows@debian.org>

Daniel Burrows <dburrows@debian.org> wrote:

>   iso-8859-1 is an 8-bit charset, while Unicode is a 32-bit [0] charset. =20
> Storing and manipulating iso-8859-1 strings requires no changes to internal=
>=20
> datatypes (only conversions for input and output); storing and manipulating=
>=20
> Unicode means you have to switch to a completely different set of=20
> string-handling functions for all internal operations.

utf-8 is an 8-bit encoding of unicode, using variable length characters.
Traditional string manipulation routines work fine, except in the case
where you need to know the number of characters rather than the number
of bytes. This is typically not a large number of areas of code.

>   [0] According to the libc manual, only 16 bits have been assigned, but GN=
> U=20
> systems use 32-bit encoding internally if the libc transcoding functions ar=
> e=20
> used.

The libc manual is out of date. We've been using more than 16 bits for a
while.

-- 
Matthew Garrett | mjg59-chiark.mail.debian.devel@srcf.ucam.org

Reply to:

References:
- charsets in debian/control
  - From: Peter Samuelson <peter@p12n.org>
- Re: charsets in debian/control
  - From: Matthew Garrett <mgarrett@chiark.greenend.org.uk>
- Re: charsets in debian/control
  - From: Peter Samuelson <peter@p12n.org>
- Re: charsets in debian/control
  - From: Daniel Burrows <dburrows@debian.org>

Prev by Date: Re: charsets in debian/control
Next by Date: Re: charsets in debian/control
Previous by thread: Re: charsets in debian/control
Next by thread: Re: charsets in debian/control
Index(es):
- Date
- Thread