[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Invalid UTF-8 byte? (was: Re: utf)

On Tue, 3 Apr 2018 13:58:33 +0200
Michael Lange <klappnase@freenet.de> wrote:

> I believe (please anyone correct me if I am wrong) that "text" files
> won't contain any null byte; many text editors even refuse to open such
> a file, I guess since they assume it is a "binary" file.
> Probably it is the same with some other control characters like 04 (End
> of Transmission). When I look at https://en.wikipedia.org/wiki/ASCII
> it seems like 1C (File Separator) or 1E (Record Separator) might be 
> appropriate choices for you. I'm no expert on this, though.

Addendum: iirc (again please correct me if I am wrong) unix file names
may contain (at least in theory) any byte except 2F (the slash) and the
null byte. So if your text files might contain arbitrary file names there
may be (at least in theory) a (admittedly very small) chance that such a
file name actually might contain any control character except the null



.-.. .. ...- .   .-.. --- -. --.   .- -. -..   .--. .-. --- ... .--. . .-.

Live long and prosper.
		-- Spock, "Amok Time", stardate 3372.7

Reply to: