Java's standard regular _expression_ package, java.util.regex
,
and many other
widely used regular _expression_ packages such as PCRE, Perl and
Python use a
backtracking implementation strategy: when a pattern presents
two alternatives
such as a|b
, the engine will try to match
subpattern a
first, and if that
yields no match, it will reset the input stream and try to match
b
instead.
If such choices are deeply nested, this strategy requires an exponential number of passes over the input data before it can detect whether the input matches. If the input is large, it is easy to construct a pattern whose running time would exceed the lifetime of the universe. This creates a security risk when accepting regular _expression_ patterns from untrusted sources, such as users of a web application.
In contrast, the RE2 algorithm explores all matches simultaneously in a single pass over the input data by using a nondeterministic finite automaton.
There are certain features of PCRE or Perl regular expressions that cannot be implemented in linear time, for example, backreferences, but the vast majority of regular expressions patterns in practice avoid such features.