[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: why would "tr --complement --squeeze-repeats ..." append the substitution char once more? ...



On Mon, Dec 11, 2023 at 11:25:13AM +0000, Albretch Mueller wrote:
>  "tr --complement --squeeze-repeats ..." makes sure that the replaced
> characters only appear once (that it doesn't immediately repeat). Say
> you have something like "  " (two spaces) or "?$|" (three characters)
> which will be replaced by just an underscore.

...which would change the length, as I wrote.

> In the case of: "ASCII text"
>  what should come out of it is: "ASCII_text"
>  not: "ASCII_text_"
>  no underscore at the end. That is the question I have.

That depends on whether your "ASCII text" has some thingy at the end
which you don't see. A newline, perchance?

>  I use such constructs as: "[A-Za-z0-9.]" to make explicit to myself
> and other people what I mean. I work in corpora research dealing with
> text based various alphabets not just in ASCII so I avoid any kinds of
> linguistic/cultural shortcuts and abbreviations.

What has this to do with how tr works? It will treat [ and ] as characters
not to substitute. I pointed that out, because it might have been unintended:

  echo -n 'This is a  text with [some brackets] in   it' | tr -cs "[A-Za-z0-9.]" "_"
  This_is_a_text_with_[some_brackets]_in_it

(Note this "-n" on the echo, btw? Without it, I'd be getting a "_" at the
end, the transliterated newline).

Do whatever you want :-)

Cheers
-- 
t

Attachment: signature.asc
Description: PGP signature


Reply to: