[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: why would "tr --complement --squeeze-repeats ..." append the substitution char once more? ...



On Mon, Dec 11, 2023 at 07:42:10AM -0500, Greg Wooledge wrote:
> On Mon, Dec 11, 2023 at 09:37:42AM +0100, tomas@tuxteam.de wrote:
> >  2. This is tr, not regexp, so '[A-Za-z0-9.]' isn't doing what you
> >    think it does. It will match '[', 'A' to 'Z', 'a' to 'z','.' and
> >    ']'. I guess you want to say 'A-Za-z0-9.'
> 
> Well spotted.
> 
> >  3. As a convenience, tr has char classes. Perhaps [:alnum:] is for
> >    you. No idea whether this is a GNU extension
> 
> It's POSIX.  100% portable, as long as you ignore any bugs in GNU tr.
> 
> Looks like GNU tr in Debian 12 still doesn't handle multibyte characters
> correctly:
> 
>     unicorn:~$ echo 'mañana' | tr ñ X
>     maXXana
> 
> So... as long as you're working in the C locale, where [:alnum:] is
> just the ASCII capital and lowercase letters and digits, you should be
> fine.

Hey, you just gave us a handy way to count how many encoding units
a character takes:

  tomas@trotzki:~$ echo 'birdie🐦here' | tr -c 'a-z' X
  birdieXXXXhereX

;-)

Cheers
-- 
t

Attachment: signature.asc
Description: PGP signature


Reply to: