[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Invalid UTF-8 byte? (was: Re: utf)



On Tuesday, April 03, 2018 08:30:04 AM Greg Wooledge wrote:
> > Addendum: iirc (again please correct me if I am wrong) unix file names
> > may contain (at least in theory) any byte except 2F (the slash) and the
> > null byte. So if your text files might contain arbitrary file names there
> > may be (at least in theory) a (admittedly very small) chance that such a
> > file name actually might contain any control character except the null
> > byte.
> 
> One might question whether a file that contains a list of filenames is
> really a "text file".  It sounds more like a broken data file.
> 
> The real question here (for the OP) is:
> 
> WHAT ARE YOU TRYING TO DO?
> 
> There was a glimpse a few messages back that looked like you were trying
> to parse information out of an mbox-format mail folder.  (I.e. a flat
> file that has a concatenated series of mbox-format mail messages in it,
> with all the silliness and problems inherent in this format, like having
> to prefix body lines with ">" if they begin with the word "From".)
> 
> "I want to write a shell script to parse an mbox folder..." is enough
> to send most people running away screaming.  What other horrors are we
> in store for next?
> 
> Of course, that might be a red herring, since you didn't actually tell
> us what your goal is, or what your inputs are, and we're having to
> guess at the moment based on tiny hints and information leaks.


Reply to: