[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: POSIX regular expressions (was: autodetecting MBR location)



Tollef Fog Heen <tollef@add.no> writes:

> *  (Colin Watson)
>
> | >For a non-POSIX regex, that is.
> |
> | Could you point me to some documentation about this? regex(7) claims
> to | describe POSIX 1003.2 regular expressions, and describes
> leftmost-first | behaviour.
>
> Hmm.  Strange.  Mastering Regular Expressions by O'Reilly has
> something about this, where they claim otherwise.  I don't have the
> POSIX specification so I can check myself, though.
>
> | So is there no correct POSIX regex library in Debian?
>
> No, not if MRE is right.  Which I suppose it is, but am not 100% sure
> of, as I haven't read the specs.

Well, the draft 4 (not the latest, but the latest I have around here)
for the next POSIX revision (http://www.opengroup.org/austin/)
says, in part (and there is no reason to assume this has changed
since POSIX - I think all re changes are in the area of character
classes, because no two implementations implemented those the
same way):

6246 The search for a matching sequence starts at the beginning of a string and stops when the
6247 first sequence matching the expression is found, where first is defined to mean ??begins
6248 earliest in the string??. If the pattern permits a variable number of matching characters and
6249 thus there is more than one such sequence starting at that point, the longest such sequence
6250 is matched. For example: the BRE "bb*" matches the second to fourth characters of abbbc,
6251 and the ERE (wee|week)(knights|night) matches all ten characters of weeknights.

That does not sound as if MRE were right.


Regards - Kai Henningsen

--
http://www.cats.ms
Spuentrup CTI       Fon: +49 700 CALL CATS (=22 55 22 87)
Windbreede 12       Fax: +49 251 322312 99
D-48157 Muenster    Mob: +49 161 322312 1
Germany             GSM: +49 171 7755060



Reply to: