[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: bash completion and spaces



On Mon, Apr 26, 2021 at 12:04:45PM +0200, Thomas Schmitt wrote:
> > what accounts for the three missing characters (namely SPACE, TAB,
> > and NEWLINE)?
> 
> They get eaten by the shell parser if you do not use quotation marks:
> 
>   $ echo $COMP_WORDBREAKS | wc -c
>   11
>   $ echo "$COMP_WORDBREAKS" | wc -c
>   14

Not the parser, technically.  The correct term is word splitting.  See
below if you want more details.

> So to see all characters (including the newline added by "echo") i do:
> 
>   $ echo "$COMP_WORDBREAKS" | hd
>   00000000  20 09 0a 22 27 3e 3c 3d  3b 7c 26 28 3a 0a        | .."'><=;|&(:.|
>   0000000e

Even better, when you're dealing with arbitrary data that may include
characters which echo might interpret:

printf %s "$COMP_WORDBREAKS" | hd

It looks like there aren't any in this particular case, but it's a good
habit to develop for future cases.


Word splitting: an unquoted substitution such as $COMP_WORDBREAKS undergoes
two more rounds of alterations: word splitting, and pathname expansion.
The word splitting round uses the contents of the IFS variable, or the
default value of "space tab newline" if IFS is unset.

Each character of IFS is treated as a word delimiter, and may cause a
split.  The characters of IFS are divided into two types: whitespace,
and non-whitespace.  All consecutive IFS whitespace characters are grouped
together and treated as a single delimiter.  Also, any single IFS
non-whitespace character may be surrounded by any number of adjacent IFS
whitespace characters, and that whole group is treated as a single
delimiter.  Finally, any leading or trailing IFS whitespace characters are
trimmed from the value and discarded.

In your "$COMP_WORDBREAKS", you can see that the value begins with space,
tab and newline.  Those are all IFS whitespace characters, so they're
discarded.  The rest of the value is free of IFS whitespace characters,
so there are no further alterations.  The result is the single word
"'><=;|&(:. which is then passed to the pathname expansion round.

(There are no globbing characters in this value, so pathname expansion
will not occur.  But in general, it's a thing you need to be aware of.)

Some simple demonstrations:

$ string='  hi    there     '
$ printf '<%s> ' "$string" ; echo
<  hi    there     >
$ printf '<%s> ' $string ; echo
<hi> <there>
$ IFS=h
$ printf '<%s> ' $string ; echo
<  > <i    t> <ere     >
$ IFS='h '
$ printf '<%s> ' $string ; echo
<> <i> <t> <ere>

The last one shows an IFS value with both whitespace and non-whitespace
characters in it.  The leading spaces are trimmed, leaving h as the
first character.  As a non-whitespace character, that one is *not* trimmed,
so it delimits an initial empty field from the rest of the string.

An example with pathname expansion:

$ IFS=$' \t\n'
$ string='/* a comment */'
$ cd /tmp
$ echo $string
/backup /bin /boot /chroot /command /dev /etc /hd /home /initrd.img /initrd.img.old /lib /lib64 /lost+found /media /mnt /opt /package /proc /root /run /sbin /service /srv /stuff /sys /tmp /usr /var /vmlinuz /vmlinuz.old a comment dumps/ ssh-6i8aLIWw2QgZ/ ssh-T5JLPWVvU9xw/ systemd-private-d50cab7eaba04f88b49c7e97e3d1043b-ModemManager.service-iD7GSh/ systemd-private-d50cab7eaba04f88b49c7e97e3d1043b-ntp.service-GeOK4e/ systemd-private-d50cab7eaba04f88b49c7e97e3d1043b-systemd-logind.service-se4pDh/ Temp-10bb8392-9d61-4469-bf31-5b5ef6c29a88/ Temp-1ffbc72d-62a7-46d4-a403-481053966525/

Word splitting occurs first, giving the four words /* a comment */
and then pathname expansion (globbing) occurs on the first and last
words.  All of the resulting words are given as arguments to echo.

All of this is why proper quoting is absolutely essential when working
with the shell.  It cannot possibly be said enough times.


Reply to: