On 03/04/18 20:55, Darac Marjal wrote: > If these things matter to you, it's better to convert from UTF-8 to > Unicode, first. I tend to think of Unicode as an arbitrarily large code > page. Each character maps to a number, but that number could be 1, 1000 > or 500_000 (Unicode seems to be growing without might end in sight). > Internally, you might store those code points as Integers or QUad Words > or whatever you like. Only once you're ready to transfer the text to > another process (print on screen, save to a file, stream across a > network), do you convert the Unicode back into UTF-8. > > Basically, you consider UTF-8 to be a transfer-only format (like > Base64). If you want to do anything non-trivial with it, decode it into > Unicode. Eh? UTF-8 is an encoding of Unicode. You can't "convert UTF-8 to Unicode" - it already is Unicode. You could convert it to another encoding, eg UTF-16 or UTF-32. Perhaps UTF-32 is what you mean, being fixed-width. Richard
Attachment:
signature.asc
Description: OpenPGP digital signature