[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Why grep still exists?



    "Zeng" == jqdkf  <jqdkf@zju.edu.cn> writes:

    Zeng> Hi, I'm just wondering why grep still exists in every
    Zeng> distribution of UNIX or UNIX like system. As egrep can
    Zeng> support a full featured regular expression, it is much more
    Zeng> powerful than grep. Is it only for backport support?

In general, yes, it is all about backport support.

The original grep and egrep (and fgrep) commands used different
strategies to search for matches. The original grep use a non
deterministic finite state automaton (NDFSA) while egrep used a
deterministic one (DFSA). The "languages" that each command could
accept were thus subtly different if I recall correctly. 

In addition, performance varied based on the complexity of the regular
expression being used. A NDFSA is easier to build, but takes longer to
search (it requires backtracking algorithms). A DFSA is very fast when
searching (you never make a mistake and so there is no need to
backtrack), but building a DFSA takes more time and memory when there
is non-determinism involved. (a DFSA grows in a non-polynomial manner
if I recall my graduate classes correctly).

GNU grep combined all this, and used a hybrid approach that uses a
sophisticated string matching algorithm (Boyer - Moore) before hitting
the regexp parser. As a result egrep/fgrep are simply shell scripts
that hit the grep command with options.

    Zeng> Simple is Beautiful.

No kidding. Many UNIX tools (grep, diff, yacc and lex come to mind)
have deep theoretical underpinnings that say a lot about the
simplicity and beauty of The Universe. I can recommend
Aho/Ullman/Sethi's book on Finite State Automata and Theory if you
find any of this interesting.

Hmmmm....I almost wish I was back in grad. school learning cool stuff
like this rather than hacking "cutting edge" code for my employer....

Cheers!
Shyamal



Reply to: