Re: why would "tr --complement --squeeze-repeats ..." append the substitution char once more? ...
On 12/11/23, Greg Wooledge <greg@wooledge.org> wrote:
> 1) Many implementations of echo will interpret parts of their argument(s),
> in addition to processing options like -n. If you want to print a
> variable's contents to standard output without *any* interpretation,
> use printf.
>
> printf %s "$myvar"
> printf '%s\n' "$myvar"
>
I will use "printf ..." from now on.
> 2) As tomas already told you, the square brackets in
>
> tr -c -s '[A-Za-z0-9.]' _
>
> are literal. You're using a command which will keep left and right
> square brackets in the input, *not* replacing them with underscores.
> This may not be what you want.
My mistake, even though it didn't get in the way of what I was trying
to do. I replaced :alnum: by what I thought it meant and left the
brackets.
> 3) In locales other than C or POSIX, ranges like A-Z are *not* necessarily
> synonyms for [:upper:]. As I've already mentioned, GNU tr is known to
> contain bugs, so you're getting lucky here. The bugs in GNU tr happen
> to work the way you're expecting, so that A-Z is treated like [:upper:]
> when it should not be. If at some point in the future GNU tr is fixed
> to conform to POSIX, your script may break.
>
> The correct tr command you should be using if you want to retain
> accented letters (as defined in your locale) is:
>
> tr -c -s '[:alnum:].' _
>
> If you want to discard accented letters, then either of these is OK:
>
> LC_COLLATE=C tr -c -s '[:alnum:].' _
> LC_COLLATE=C tr -c -s 'A-Za-z0-9.' _
>
I like your second one liner much better (LC_COLLATE=C tr -c -s 'A-Za-z0-9.' _)
I tend to avoid '[:alnum:].' because the intended meaning of
"ALphabetic et NUMeric" characters, even though it depends on the
locale has a strong ASCII accent to it.
> Thus, we come full circle.
Yes, we did. Thank you, lbrtchx
Reply to: