[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: delimiters with more than one character? ...



On 2020-07-15 at 10:11, Bob Weber wrote:

> On 7/15/20 8:44 AM, Greg Wooledge wrote:
> 
>> On Wed, Jul 15, 2020 at 08:34:36AM -0400, Bob Weber wrote:
>> 
>>> My only purpose was to show how tr could be used to handle
>>> multiple characters as a delimiter either as tr -s '\\\|' '\|'
>>> or
>> 
>> The problem is, it can't, at least not the way you showed.  The
>> original example, sadly, did NOT contain instances of the | and \
>> characters in isolation, so one might be lulled into a false sense
>> of security, and write code that (for example) simply deletes all
>> of the \ characters, and then splits on the | characters.
>> 
>> But that won't work in the general case, where | and \ might appear
>> as literal data characters.
>> 
>> My own solution, which involved using awk to convert the \| pairs
>> into NUL bytes, is also technically incorrect.  However, there was
>> an additional stipulation: the stream was to be converted into a
>> bash array.  A bash array is a list of C strings, so they cannot
>> contain NUL bytes.  Therefore you can't possibly have NUL bytes in
>> the original input stream (at least, not and still produce a bash
>> array), so my conversion of the multi-character delimiters into NUL
>> bytes will "work".
>> 
>> But it's a freaking ugly problem any way you look at it, and it
>> just got uglier when it was revealed that the OP might be trying to
>> write shell code that parses shell code.  Especially if the code in
>> question is a series of poorly written GNU-tainted grep commands.
>> 
> Which is why I showed this:
> 
> tr -s '\\\|' '\|'
> 
> which replaces \| with a single character which is known not to be in
> the input data

How do you know that?

We don't necessarily have the full input data set. We have a sample
input data set, which may or may not be the only one that will ever be
used.

If '|' were guaranteed to never occur in the input data, it would
probably have been selected as the delimiter standalone, rather than
only as part of the '\|' pair.

-- 
   The Wanderer

The reasonable man adapts himself to the world; the unreasonable one
persists in trying to adapt the world to himself. Therefore all
progress depends on the unreasonable man.         -- George Bernard Shaw

Attachment: signature.asc
Description: OpenPGP digital signature


Reply to: