On Sun, Jan 19, 2003 at 08:41:01AM -0500, H. S. Teoh wrote: > But as someone pointed out, this totally doesn't handle /*'s and //'s > appearing inside quoted strings. I overlooked that aspect of it. > Nevertheless, it *must* be possible to write a regex for it, since > mathematically speaking, a finite state machine is powerful enough to > tokenize C. Of course, that doesn't say anything about how complex the > regex might have to be to cover all cases. :-) This logic is flawed. While it is possible to tokenize C with a finite automaton, this doesn't really relate to your objective of selecting a subset of those tokens. It is likely that a stateful lexer is required (flex can do this using start states), or postprocessing with something more powerful. -- .''`. ** Debian GNU/Linux ** | Andrew Suffield : :' : http://www.debian.org/ | Dept. of Computing, `. `' | Imperial College, `- -><- | London, UK
Attachment:
pgpZi0O4h1mWM.pgp
Description: PGP signature