Invalid UTF-8 byte? (was: Re: utf)
On Monday, April 02, 2018 03:39:05 AM Andre Majorel wrote:
> > Why? UTF (especially UTF-8) is vastly superior for all purposes:
> I wouldn't say that. UTF-8 breaks a number of assumptions. For
> instance,
> 1) every character has the same size,
> 2) every byte sequence is a valid character,
A few weeks ago, I was looking for a byte that, in UTF-8, would be a totally
invalid byte (not an invalid sequence of bytes). At the time, I tried some
googling, but it looked rather hopeless (maybe it was my googling that was
hopeless).
I know that your statement does not imply there is such a byte, but maybe you
(or someone else reading this) know(s)?
(The reason I wanted such a byte was to use it as a record separator in a set
of text files (that I use as an askSam "workalike" (or "worksimilar") so that I
could use msort (which depends on a 1 byte record separator to --separate the
records ;-) while sorting.) (Some of the files already include UTF-8, and, in
the future, I anticpate all will be in UTFF-8.)
> 3) the equality or inequality of two characters comes down to
> the equality or inequality of the bytes they encode to.
Reply to:
- References:
- utf
- From: mess-mate <mess-mate@gmx.com>
- Re: utf
- From: Ben Caradoc-Davies <ben@transient.nz>
- Re: utf
- From: Andre Majorel <aym-naibed@teaser.fr>