[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Regexp to parse "Version:" fields



Hi...

for mentors.debian.net I would like to find a perfect (TM) regular
expression to split the "Version:" line of a control file into:

 - epoch
 - upstream version
 - Debian package revision

My current attempt is:

   ^(?:(\d+):)?(\d[\w\.\+-:]*?)(?:-(.+))?$

I have extracted the version of my currently installed packages to try
them on this regular expression using this shell line:

   dpkg -l | awk '{print $2}' | xargs apt-cache show | \
      grep Version: | awk {'print $2'}

So far every package's version has matched correctly. However some cases
that are probably valid according to the Policy[1] are not handled
correctly. Example:

 42:1.0-23-2352abceasdfas-2

...is split into epoch "42", upstream version "1.0" and package revision
"23-2352abceasdfas-2". This is obviously wrong because the last "-2" is
the package revision. According to the policy version strings with
multiple hyphens ("-") are allowed and all but the last hyphen are part
of the upstream version.

So before diving into into regular expressions any further (I have no
practical experience with lookahead patterns for example) I would like
to know if anyone else has worked on this. Perhaps even some Debian tool
does this parsing. I need an implementation in Python but if someone
shows me Perl code of a non-regexp algorithm I'd be happy, too.

Thanks in advance.

Kindly
 Christoph

[1] http://www.us.debian.org/doc/debian-policy/ch-controlfields.html#s-f-Version
-- 
~
~
".signature" [Modified] 1 line --100%--                1,48         All

Attachment: signature.asc
Description: Digital signature


Reply to: