[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: why would "tr --complement --squeeze-repeats ..." append the substitution char once more? ...



On Mon, Dec 11, 2023 at 08:04:06AM +0000, Albretch Mueller wrote:
> On 12/11/23, Greg Wooledge <greg@wooledge.org> wrote:
> > Please tell us ...
> 
>  OK, here is what I did as a t-table

[...]

Your style is confusing, to say the least. Why not play with minimal
examples and work your way up from that?

> the two strings are not the same length even though your are just
> replacing ASCII characters, why did:
> echo "${ftype}" | tr --complement --squeeze-repeats '[A-Za-z0-9.]' '_'
> place a character at the end?

Two things stick out:

 1. with --squeeze-repeats you are challenging tr to output less
   characters than the input has:

   trotzki:~$ echo -n "this is a #   string ###" | tr -cs 'a-z' '_'
   => this_is_a_string_

   (I allowed myself to simplify things a bit) See? tr is squeezing
   repeats (repeated matches), the space-plus-three-hashes at the
   end gets squeezed to just one _, thus changing the length.
   If your strings contain more than one non-alphanumeric (something
   I don't feel like even trying a guess at), this is bound to happen.
   You ordered it.

 2. This is tr, not regexp, so '[A-Za-z0-9.]' isn't doing what you
   think it does. It will match '[', 'A' to 'Z', 'a' to 'z','.' and
   ']'. I guess you want to say 'A-Za-z0-9.'

 3. As a convenience, tr has char classes. Perhaps [:alnum:] is for
   you. No idea whether this is a GNU extension

 4. In case of doubt, read the man page :)

Cheers
-- 
t

Attachment: signature.asc
Description: PGP signature


Reply to: