Invalid UTF-8 byte? (was: Re: utf)

To: debian-user@lists.debian.org
Subject: Invalid UTF-8 byte? (was: Re: utf)
From: rhkramer@gmail.com
Date: Mon, 2 Apr 2018 08:37:54 -0400
Message-id: <[🔎] 201804020837.54725.rhkramer@gmail.com>
In-reply-to: <[🔎] 20180402073904.GB19322@aym.net2.nerim.net>
References: <[🔎] 92aa2f6d-d39f-61a6-311b-f0c45b00b9c9@gmx.com> <[🔎] 0a5c15a9-0dfc-1ef3-1f64-1880def0ff1e@transient.nz> <[🔎] 20180402073904.GB19322@aym.net2.nerim.net>

On Monday, April 02, 2018 03:39:05 AM Andre Majorel wrote:
> > Why? UTF (especially UTF-8) is vastly superior for all purposes:
> I wouldn't say that. UTF-8 breaks a number of assumptions. For
> instance,
> 1) every character has the same size,
> 2) every byte sequence is a valid character,

A few weeks ago, I was looking for a byte that, in UTF-8, would be a totally 
invalid byte (not an invalid sequence of bytes).  At the time, I tried some 
googling, but it looked rather hopeless (maybe it was my googling that was 
hopeless).

I know that your statement does not imply there is such a byte, but maybe you 
(or someone else reading this) know(s)?

(The reason I wanted such a byte was to use it as a record separator in a set 
of text files (that I use as an askSam "workalike" (or "worksimilar") so that I 
could use msort (which depends on a 1 byte record separator to --separate the 
records ;-) while sorting.)  (Some of the files already include UTF-8, and, in 
the future, I anticpate all will be in UTFF-8.)

> 3) the equality or inequality of two characters comes down to
>    the equality or inequality of the bytes they encode to.

Reply to:

Follow-Ups:
- Re: Invalid UTF-8 byte? (was: Re: utf)
  - From: <tomas@tuxteam.de>
- Re: Invalid UTF-8 byte? (was: Re: utf)
  - From: Henrique de Moraes Holschuh <hmh@debian.org>
- Re: Invalid UTF-8 byte? (was: Re: utf)
  - From: Michael Lange <klappnase@freenet.de>
- Re: Invalid UTF-8 byte? (was: Re: utf)
  - From: Jonathan de Boyne Pollard <J.deBoynePollard-newsgroups@NTLWorld.COM>

References:
- utf
  - From: mess-mate <mess-mate@gmx.com>
- Re: utf
  - From: Ben Caradoc-Davies <ben@transient.nz>
- Re: utf
  - From: Andre Majorel <aym-naibed@teaser.fr>

Prev by Date: Re: utf
Next by Date: Re: Invalid UTF-8 byte? (was: Re: utf)
Previous by thread: Re: utf
Next by thread: Re: Invalid UTF-8 byte? (was: Re: utf)
Index(es):
- Date
- Thread