Hi... for mentors.debian.net I would like to find a perfect (TM) regular expression to split the "Version:" line of a control file into: - epoch - upstream version - Debian package revision My current attempt is: ^(?:(\d+):)?(\d[\w\.\+-:]*?)(?:-(.+))?$ I have extracted the version of my currently installed packages to try them on this regular expression using this shell line: dpkg -l | awk '{print $2}' | xargs apt-cache show | \ grep Version: | awk {'print $2'} So far every package's version has matched correctly. However some cases that are probably valid according to the Policy[1] are not handled correctly. Example: 42:1.0-23-2352abceasdfas-2 ...is split into epoch "42", upstream version "1.0" and package revision "23-2352abceasdfas-2". This is obviously wrong because the last "-2" is the package revision. According to the policy version strings with multiple hyphens ("-") are allowed and all but the last hyphen are part of the upstream version. So before diving into into regular expressions any further (I have no practical experience with lookahead patterns for example) I would like to know if anyone else has worked on this. Perhaps even some Debian tool does this parsing. I need an implementation in Python but if someone shows me Perl code of a non-regexp algorithm I'd be happy, too. Thanks in advance. Kindly Christoph [1] http://www.us.debian.org/doc/debian-policy/ch-controlfields.html#s-f-Version -- ~ ~ ".signature" [Modified] 1 line --100%-- 1,48 All
Attachment:
signature.asc
Description: Digital signature