[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Substring search in dash [ _NOT_ bash ] shell



On Tue, Sep 27, 2016 at 10:07:14AM +0200, Thomas Schmitt wrote:
> if you cannot find a dash tutorial then get a tutorial for bash or sh and 
> test in dash whether the proposals apply properly.

This is potentially bad advice, for several reasons.

First, there are more bad tutorials out there in the wild than good
tutorials, by at least one order of magnitude.  The chances of randomly
stumbling upon one of the good ones are low.

Second, trying to work backwards from a bash tutorial to a POSIX sh/dash
programming style is going to be maddening.  If you're writing for sh,
definitely read documents aimed at sh, rather than some other shell.

Third, even dash has a few extensions to the POSIX shell syntax.
Which, if you're writing specifically for *dash* instead of POSIX sh
in general, is OK... but then you're tying yourself to one specific
shell which isn't in widespread use outside of Debian/Ubuntu.

> Bash and dash both stem from S.R.Bourne's sh. The shell chapters of
> his book "The Unix System" from 1983 still apply.

I have not read this book, but I have a copy of Kernighan & Pike's "The
Unix Programming Environment".  While it's a pretty good book for learning
the concepts, the shell script examples in it are atrociously BAD.
Adopting them as a style guide would be a horrible idea.  They're full
of all of the old shell scripting bugs and assumptions that make programs
fail in a real-life environment, where filenames can contain spaces, etc.

Essentially, every shell script or tutorial written before the year 2000
(and >= 95% of the ones written *after* that) is pure rubbish.

Want to know how bad it is?  Install the Debian package "manpages-posix"
and then read "man 1p sh" -- the SH(P) manual page for the POSIX shell,
written by the *people who made POSIX* (an older version) in 2003.

Search for the word EXAMPLE (all caps) and then scroll up to the example
right before it.  Here is it, reproduced as plain text:

              #
              # Installation time script to install correct POSIX shell pathname
              #
              # Get list of paths to check
              #
              Sifs=$IFS
              IFS=:
              set $(getconf PATH)
              IFS=$Sifs
              #
              # Check each path for 'sh'
              #
              for i in $@
              do
                  if [ -f ${i}/sh ];
                  then
                      Pshell=${i}/sh
                  fi
              done
              #
              # This is the list of scripts to update. They should be of the
              # form '${name}.source' and will be transformed to '${name}'.
              # Each script should begin:
              #
              # !INSTALLSHELLPATH -p
              #
              scripts="a b c"
              #
              # Transform each script
              #
              for i in ${scripts}
              do
                  sed -e "s|INSTALLSHELLPATH|${Pshell}|" < ${i}.source > ${i}
              done

It is terrifying!  It's every single BAD shell practice all thrown together
into a single example!

1) Attempting to save IFS in a variable and restore it.  This fails if IFS
   is originally unset.

2) Unquoted $(command) substitution, relying on IFS word splitting but
   forgetting to disable globbing.

3) for i in $@
   instead of: for i in "$@"
   or simply: for i

   There is NO excuse for this sloppiness.  None.  Let alone from people
   writing an official POSIX manual!

4) Unquoted ${i} expansions all over the place.  Putting in useless curly
   braces is NOT a substitute for quoting.

5) Inconsistent use of useless curly braces.  If you're going to use them
   around i and Pshell, why omit them around IFS, Sifs and @?  (Of course,
   I would just omit *all* of them.)

6) Storing a space-delimited list in a string variable (scripts="...")
   and then using it as a pseudo-array, again without disabling globbing.
   What makes this one *especially* bad is that the example already used
   the @ pseudo-array earlier.  The author *knows* how to use @ instead
   of a string variable, but didn't do it!

THIS is what we're fighting against.  40 years of this.

So, in short, avoid the older books, manuals, tutorials and examples.
And be cautious even of the newer ones.

> -------------------------------------------------------------------

> The "test" expression used is "A = B". There are operators like "-o" for
> logical "or". "A -o B" is true if a is true, or if be is true, or both are
> true.

Do not use -o and -a in a test command.  It's not portable.  Instead,
use two separate test commands with || or && between them:

if [ A ] || [ B ]; then
 ...
fi

See http://mywiki.wooledge.org/BashPitfalls#pf6

> For substring search you would have to employ a program like "grep".
> It returns 0 if something was found, else it returns 1.
> 
>   if hostname | grep 'bob' >/dev/null ; then echo bob.cfg ; fi

grep -q 'bob' is slightly more efficient than grep 'bob' >/dev/null
because grep -q is allowed to stop reading as soon as it finds the first
match.

> grep has a very powerful expression language which is not the same as
> the shell patterns. It puts out to stderr what it finds. Here we dump
> those messages to /dev/null.

You are redirecting stdout, not stderr.


On Tue, Sep 27, 2016 at 10:45:13AM +0200, Thomas Schmitt wrote:
> There are operators like "-o" for logical "or":
> 
>   "`hostname`" = bob -o "`hostname`" = mary -o "`hostname`" = sam

Do not use these.  Of course, in this specific example, case is the
better choice:

case $(hostname) in
  bob|mary|sam) ... ;;
esac

However, even if case were not clearly superior for this example,
test ... -o ... is still non-portable as described above.

-------------------------------------------------------------------

I don't have any specific advice for learning POSIX sh.  The wiki that
I host has lots of pages aimed at bash, and some of the pages have sh
alternative examples, but that's quite a different thing from an actual
sh guide.

The main problem with writing for sh is that nothing works the way
you expect.  Even the common techniques you'd use in bash, hackish as
those are, still don't work in sh.  You have to throw them all away and
learn *even worse* hacks.

Take arrays, for example. In bash 2.0 through 3.2, there are sparse
integer-indexed arrays.  You can use these as lists, sort of, if you
ignore the indices.  They can store file pathnames safely (spaces,
newlines, everything but NULs).  In bash 4.0 and higher, there are also
string-indexed associative arrays.

sh has *no* arrays at all.  There are precisely two ways you can store
a list of file pathnames in sh: in the @ pseudo-array (the positional
parameters), or in a file.  The @ pseudo-array has one major drawback:
there's only one of it.  That means your sh script can only have one
"array" stored in memory at a time.  If you want a second "array", you
have to destroy the first one.

Except, that's not quite correct.  You can actually have one @ "array"
per function, since each function has its own local parameters.  So
you could have a second "array" as long as it's local to a function.
You still can't ever see two "arrays" at the same time, because a
function can only see its own @, not its parent's.

For an example of using @ as an "array" (well, really a list) inside a
function, see http://mywiki.wooledge.org/BashFAQ/050 (section 5).

Now, take that level of hackish complexity and apply it to *every*
single language feature you try to use.

That is sh.


Reply to: