[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: OT: regular expression question



>>>>> "Peter" == Peter Jay Salzman <p@dirac.org> writes:

    Peter> i know how to search for palindromes, for instance, a 3 letter
    Peter> palindrome: \(.\)\(.\)\(.\)\3\2\1

Well, strictly speaking, that's not really a regular expression.  (And I would
also count that as a 6 letter palindrome.)  It uses extensions to regular
expressions that are implemented by many UN*X programs.  In pure regular
expressions, the only special characters are +, *, (, ), and |.  (Did I miss
any?)  The \1, \2, etc. are extensions.

    Peter> recently, i started wondering if there was a way, using regex only,
    Peter> to express a palindrome of arbitrary letter length?

No.  If you want to get more in depth into regular expressions, I would
recommend a good compiler course, computational theory course, or a good
compiler text (the one by Aho is supposedly pretty good).  The compiler course
that I took (University of Alberta) was excellent, and we discussed many
questions such as this.  I don't know if other Universities are the same.

    Peter> i've used some grey matter, and the answer seems to be no, but
    Peter> there's nice symmetry here.  maybe i'm missing something...

Well, if you want a proof, here's a rough outline.

There is a duality between regular expressions and Deterministic Finite
Automata (DFA)- any language (i.e. collection of words) which can be recognized
by one can be recognized by the other.

What I mean by DFA is this: (it's much like a Turing machine) the machine has a
one-way read-only tape, which contains the word to be recognized.  It has a
read head, a finite number of states, and a transition function.  At any given
step, the read head reads a letter, and the machine switches states according
to the transition function (given current state, and the letter).  If, after
reading the word, the DFA ends up in one of the pre-determined "accept" states,
then the DFA accepts the word.  Otherwise, it rejects the word.

One can think of the states roughly as some sort of memory for the DFA.

Now suppose that a DFA has N states.  Then after some thinking, you should be
able to convince yourself that it would not be able to recognize all
palindromes of length 2N+1, because the DFA does not have enough "memory" to
handle all possible cases.

So since there is no DFA to recognize palindromes, there is no regular
expression.  There are many other languages that pure regular expressions
cannot handle, such as \(.*\)\1 (which, I guess, is one of reasons that
extension was introduced), \(a^N\)\(b^N\) (where a^N means a repeated N
times. parens added just for clarity) - this is the problem of matching
parentheses.

HTH

If there's anything I didn't explain clearly, let me know and I'll try it
again.

Hubert

-- 
____     |     -----------------------------------------------------------
|  /   --+--
| /   ___|___    Hubert Chan <hackerhue@crosswinds.net>
| \   | _|_ |
|__|  |__|__|    GCS/M d- s:- a-- C++ UL+(++++) P++ L++ E++ W++ N++ o?
|        |       K? w--- O++ M- V- PS-- PE+++ Y+ PGP+ t+ 5 X R- tv+ b+
|      / | \     DI++++ D G e++ h! !r !y
|     /  |  \
|        |     <><------------------ http://www.crosswinds.net/~hackerhue/

PGP/GnuPG fingerprint: 6CC5 822D 2E55 494C 81DD  6F2C 6518 54DF 71FD A37F
Key can be found at http://www.crosswinds.net/~hackerhue/hackerhue.asc



Reply to: