[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: utf



On 03/04/18 20:55, Darac Marjal wrote:
If these things matter to you, it's better to convert from UTF-8 to Unicode, first.

Fixed length encodings like UTF-32 will not fix broken assumptions about some relationship between byte length and number of characters because Unicode contains things like combining characters. What is the length of a string? Are you trying to count the number of glyphs? I do not think that you can do this by naïvely counting code points, regardless of encoding.

Because there is more than one way to represent an accented character, Unicode string comparison is nontrivial:
https://en.wikipedia.org/wiki/Unicode_equivalence

Kind regards,

--
Ben Caradoc-Davies <ben@transient.nz>
Director
Transient Software Limited <https://transient.nz/>
New Zealand


Reply to: