Re: C comment extraction, or a bag .deb of small commands
On Sun, Jan 19, 2003 at 05:41:24PM +0000, Andrew Suffield wrote:
> On Sun, Jan 19, 2003 at 08:41:01AM -0500, H. S. Teoh wrote:
> > But as someone pointed out, this totally doesn't handle /*'s and //'s
> > appearing inside quoted strings. I overlooked that aspect of it.
> > Nevertheless, it *must* be possible to write a regex for it, since
> > mathematically speaking, a finite state machine is powerful enough to
> > tokenize C. Of course, that doesn't say anything about how complex the
> > regex might have to be to cover all cases. :-)
> This logic is flawed. While it is possible to tokenize C with a finite
> automaton, this doesn't really relate to your objective of selecting a
> subset of those tokens. It is likely that a stateful lexer is required
> (flex can do this using start states), or postprocessing with
> something more powerful.
Well, it might require something like a Perl regex that returns
sub-matches to accomplish this. But theoretically speaking, you can just
have an automaton that scans all C tokens but only outputs comments. This
doesn't require a stateful lexer. Regex implementations that constrain you
to output everything you match is a different story, of course.
(Theoretically, flex's start states do not add any more power to the
finite automaton, and is therefore equivalent to a finite automaton,
unless you use recursive start states, which is not required here.)
The two rules of success: 1. Don't tell everything you know. -- YHL