Re: utf
On 03/04/18 20:55, Darac Marjal wrote:
If these things matter to you, it's better to convert from UTF-8 to
Unicode, first.
Fixed length encodings like UTF-32 will not fix broken assumptions about
some relationship between byte length and number of characters because
Unicode contains things like combining characters. What is the length of
a string? Are you trying to count the number of glyphs? I do not think
that you can do this by naïvely counting code points, regardless of
encoding.
Because there is more than one way to represent an accented character,
Unicode string comparison is nontrivial:
https://en.wikipedia.org/wiki/Unicode_equivalence
Kind regards,
--
Ben Caradoc-Davies <ben@transient.nz>
Director
Transient Software Limited <https://transient.nz/>
New Zealand
Reply to:
- Follow-Ups:
- Re: utf
- From: Nicolas George <george@nsup.org>
- References:
- utf
- From: mess-mate <mess-mate@gmx.com>
- Re: utf
- From: Ben Caradoc-Davies <ben@transient.nz>
- Re: utf
- From: Andre Majorel <aym-naibed@teaser.fr>
- Re: utf
- From: Darac Marjal <mailinglist@darac.org.uk>