Re: utf

To: debian-user@lists.debian.org
Subject: Re: utf
From: Darac Marjal <mailinglist@darac.org.uk>
Date: Tue, 3 Apr 2018 09:55:51 +0100
Message-id: <[🔎] 20180403085551.GA30859@darac.org.uk>
In-reply-to: <[🔎] 20180402073904.GB19322@aym.net2.nerim.net>
References: <[🔎] 92aa2f6d-d39f-61a6-311b-f0c45b00b9c9@gmx.com> <[🔎] 0a5c15a9-0dfc-1ef3-1f64-1880def0ff1e@transient.nz> <[🔎] 20180402073904.GB19322@aym.net2.nerim.net>

On Mon, Apr 02, 2018 at 09:39:05AM +0200, Andre Majorel wrote:

On 2018-04-02 08:00 +1200, Ben Caradoc-Davies wrote:

On 02/04/18 02:05, mess-mate wrote:
>howto change the system utf to eu character set ?

Why? UTF (especially UTF-8) is vastly superior for all purposes:


I wouldn't say that. UTF-8 breaks a number of assumptions. For
instance,
1) every character has the same size,
2) every byte sequence is a valid character,
3) the equality or inequality of two characters comes down to
  the equality or inequality of the bytes they encode to.

If these things matter to you, it's better to convert from UTF-8 toUnicode, first. I tend to think of Unicode as an arbitrarily large codepage. Each character maps to a number, but that number could be 1, 1000or 500_000 (Unicode seems to be growing without might end in sight).Internally, you might store those code points as Integers or QUad Wordsor whatever you like. Only once you're ready to transfer the text toanother process (print on screen, save to a file, stream across anetwork), do you convert the Unicode back into UTF-8.

Basically, you consider UTF-8 to be a transfer-only format (likeBase64). If you want to do anything non-trivial with it, decode it intoUnicode.


With ASCII and the many encodings based on it, most things can
be done without having knowledge of the encoding. With UTF-8,
even basic operations like determining the length of a string or
reporting at what column an error occurred require knowledge of
the encoding.

--
André Majorel <http://www.teaser.fr/~amajorel/>
Imagine what would happen if the Debian project disclosed the email
addresses of their users. Spambots would harvest them and Debian
users would be inundated with spam. Good thing they don't, eh ?


--
For more information, please reread.

Attachment: signature.asc
Description: PGP signature

Reply to:

Follow-Ups:
- Re: utf
  - From: Richard Hector <richard@walnut.gen.nz>
- Re: utf
  - From: Nicolas George <george@nsup.org>
- Re: utf
  - From: Ben Caradoc-Davies <ben@transient.nz>
- Re: utf
  - From: Henrique de Moraes Holschuh <hmh@debian.org>

References:
- utf
  - From: mess-mate <mess-mate@gmx.com>
- Re: utf
  - From: Ben Caradoc-Davies <ben@transient.nz>
- Re: utf
  - From: Andre Majorel <aym-naibed@teaser.fr>

Prev by Date: Re: hostname
Next by Date: Re: utf
Previous by thread: Re: utf
Next by thread: Re: utf
Index(es):
- Date
- Thread