[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: delimiters with more than one character? ...



On 7/15/20 8:44 AM, Greg Wooledge wrote:
On Wed, Jul 15, 2020 at 08:34:36AM -0400, Bob Weber wrote:
My only purpose was to show how tr could be used to handle multiple
characters as a delimiter either as tr -s '\\\|' '\|' or
The problem is, it can't, at least not the way you showed.  The original
example, sadly, did NOT contain instances of the | and \ characters in
isolation, so one might be lulled into a false sense of security, and
write code that (for example) simply deletes all of the \ characters,
and then splits on the | characters.

But that won't work in the general case, where | and \ might appear as
literal data characters.

My own solution, which involved using awk to convert the \| pairs into
NUL bytes, is also technically incorrect.  However, there was an
additional stipulation: the stream was to be converted into a bash
array.  A bash array is a list of C strings, so they cannot contain
NUL bytes.  Therefore you can't possibly have NUL bytes in the original
input stream (at least, not and still produce a bash array), so my
conversion of the multi-character delimiters into NUL bytes will "work".

But it's a freaking ugly problem any way you look at it, and it just
got uglier when it was revealed that the OP might be trying to write
shell code that parses shell code.  Especially if the code in question
is a series of poorly written GNU-tainted grep commands.

Which is why I showed this:

tr -s '\\\|' '\|'

which replaces \| with a single character which is known not to be in the input data and usable as a awk field separator.  It just happens to be a | which is ok with awk and and I have used as a separator in code over 30 years ago.
--


...Bob

Reply to: