[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: C comment extraction, or a bag .deb of small commands



On Sun, Jan 19, 2003 at 08:41:01AM -0500, H. S. Teoh wrote:
> But as someone pointed out, this totally doesn't handle /*'s and //'s
> appearing inside quoted strings. I overlooked that aspect of it.
> Nevertheless, it *must* be possible to write a regex for it, since
> mathematically speaking, a finite state machine is powerful enough to
> tokenize C. Of course, that doesn't say anything about how complex the
> regex might have to be to cover all cases. :-)

This logic is flawed. While it is possible to tokenize C with a finite
automaton, this doesn't really relate to your objective of selecting a
subset of those tokens. It is likely that a stateful lexer is required
(flex can do this using start states), or postprocessing with
something more powerful.

-- 
  .''`.  ** Debian GNU/Linux ** | Andrew Suffield
 : :' :  http://www.debian.org/ | Dept. of Computing,
 `. `'                          | Imperial College,
   `-             -><-          | London, UK

Attachment: pgpZi0O4h1mWM.pgp
Description: PGP signature


Reply to: