Given that DEP-5 is supposed to be about machine-
readability, I thought it would be worthwhile trying to
write something to parse the proposed format. Please find
attached a short python script that I have written based on
the current text of DEP-5 at dep.debian.net[1].
It's designed to be run from an unpacked and patched source
package (or at least a source tree containing
debian/copyright, which it attempts to parse). It will
print out a list of each Files: stanza found in the
copyright, followed by the list of files which it believes
are matched by the stanza.
It has proven useful to me: I found several bugs in a
copyright file I'd written for a real live package, based
on my misinterpretation of the current wording.
Whilst writing this, I found the syntax chosen for the
Files: field to be very awkward. Indeed my crude parser
only handles a subset of the syntax so far (no escapes, no
handling of quoted strings).
Most of the examples given in DEP-5 containing the path
character will not work, either, e.g.
Files: debian/*
Assuming they are passed into a find(1) invocation like so
find . -path 'debian/*'
(note the presence of the path separator and the wording
about that in the text)
they need to be prefixed with './', even if you omit '.' in
the find execution (which itself is a GNUism iirc). Patch
attached.
I think I would much prefer using regular expressions here.
For one thing I'm worried about variations in find(1)
behaviours across platforms. For another, unless a parser
calls find(1) (as I have, and it's expensive), trying to
match its behaviour will imho be a lot more error prone
than using your languages built-in regular expression
library or pcre or whatever. I will try to cook a patch for
comment.
[1] (I need to re-read the older DEP-5 messages to
understand the current maintainership situation: I see
Steve remove the other drivers in that version, and
Charles do the same in his git repo...)
--
Jon Dowland
#!/usr/bin/python
# a crude DEP-5 parser
# Copyright (c) 2009 Jon Dowland <jmtd@debian.org>
# Copying and distribution of this file, with or without modification, are
# permitted in any medium without royalty provided the copyright notice and this
# notice are preserved.
# usage: run the script from within an unpacked source tarball with the debian
# diff.gz applied on top (or at least, a DEP-5-syntax debian/copyright file
# available)
from email import parser
from sys import exit
from os import popen
##############################################################################
## step 1: handle/parse RFC822 superset
# remove blank lines so the parser treats it all as an email header
copyright = parser.Parser().parsestr(
''.join(
filter(lambda x: "\n" != x,
open("debian/copyright").readlines()
)))
if len(copyright.keys()) < 1:
print "parser didn't get any headers from the copyright file"
exit(1)
##############################################################################
## step 2: interpret the headers and build a list of tuples
## (files, license, copyright)
# DEP5 header. Format-Specification is required. Others are optional.
valid = "Format-Specification Name Maintainer Source Disclaimer".split()
header = dict([ [x,''] for x in valid])
files = "Files Copyright License".split()
# first loop: handle the header
for i in range(0,len(copyright.items())):
key = copyright.keys()[i]
# skip over x-Arbitrary: headers
if key[0] == 'x':
continue
if key in valid:
if header[key]:
print "error: redefinition of '%s'." % key
exit(1)
header[key] = copyright.values()[i]
continue
# this marks the transition from the header onwards
if key in files:
if not header['Format-Specification']:
print "error: Format-Specification must be defined " +\
"before the Files section"
exit(1)
break
print "unrecognised key '%s'" % key
exit(1)
# second loop: looping through the main parts
current = dict([ [x,''] for x in files])
tuples = []
# take a hash of Files/Copyright/License and split it up
# into multiple ones based on the Files key
# first rule: multiple items separated by commas
# XXX: unhandled: escaped commas; quoted-strings
# containing commas
def append(tuples, current):
for t in current['Files'].split(","):
c = current.copy()
c['Files'] = t.strip()
tuples.append(c)
for i in range(i,len(copyright.items())):
key = copyright.keys()[i]
# skip over x-Arbitrary: headers
if key[0] == 'x':
continue
if key in files:
# handle implicit 'Files: *'
if 'Files' != key and not current['Files']:
current['Files'] = '*'
# new Files: stanza ends the last one
elif 'Files' == key and current['Files']:
for defn in ['License', 'Copyright']:
if not current[defn]:
print "error: missing %s line for Files: %s" \
% (defn, current['Files'])
exit(1)
append(tuples,current)
current = dict([ [x,''] for x in files])
# new License or Copyright for existing Files:
if current[key]:
print "error: redefinition of '%s'. Missing 'Files' item?" % key
print "line is %d, value is '%s'" % (i,copyright.values()[i])
exit(1)
current[key] = copyright.values()[i]
continue
print "unrecognised key '%s'" % key
exit(1)
tuples.append(current)
# DEP-5 states "If multiple Files declarations match the same file, then only
# the last match counts.". This suggests no inheritance is possible between
# stanzas. Thus, reversing the list means we can look for the *first* matching
# stanza.
tuples.reverse()
##############################################################################
## step 3: indicate mapping of stanzas to source files
## we run find(1) for each tuple to build up a list of files which match
## the Files: definition. We then run find(1) again on the source directory
## to obtain a list of all files, then compare results.
# a list of [ (Files:, [matching files]) ] for each Files
# populated with the list of files which match each Files: key
matching = []
for t in tuples:
nameorpath = 'name'
if t['Files'].count('/') > 0:
nameorpath = 'path'
runme = "find . -type f -%s \"%s\" 2>/dev/null" % (nameorpath, t['Files'])
matching.append( (t['Files'], [ x.strip() for x in popen(runme).readlines() ]) )
# { Files: => [matching files] }, this time populated by
# comparing every file against each stanza in turn
results = dict([ [x['Files'],[]] for x in tuples ])
results['no match'] = []
for fname in [x.strip() for x in popen('find . -type f').readlines()]:
res = 'no match'
for pair in matching:
if fname in pair[1]:
res = pair[0]
break
results[res].append(fname)
for hash in tuples:
print "%s:" % hash['Files']
for value in results[hash['Files']]:
print "\tmatches %s" % value
Index: dep5.mdwn
===================================================================
--- dep5.mdwn (revision 105)
+++ dep5.mdwn (working copy)
@@ -144,7 +144,7 @@
Example 1 (tri-licensed files).
- Files: src/js/editline/*
+ Files: ./src/js/editline/*
Copyright: 1993, John Doe
1993, Joe Average
License: MPL-1.1 or GPL-2 or LGPL-2.1
@@ -161,12 +161,12 @@
Example 2 (recurrent license).
- Files: src/js/editline/*
+ Files: ./src/js/editline/*
Copyright: 1993, John Doe
1993, Joe Average
License: MPL-1.1
- Files: src/js/fdlibm/*
+ Files: ./src/js/fdlibm/*
Copyright: 1993, J-Random Corporation
License: MPL-1.1
@@ -365,7 +365,7 @@
License can be found in the `/usr/share/common-licenses/GPL-2'
file.
- Files: debian/*
+ Files: ./debian/*
Copyright: 1998, Jane Smith <jsmith@example.net>
License:
[LICENSE TEXT]
@@ -384,7 +384,7 @@
License: PSF-2
[LICENSE TEXT]
- Files: debian/*
+ Files: ./debian/*
Copyright: 2008, Dan Developer <dan@debian.example.com>
License:
Copying and distribution of this package, with or without
@@ -392,27 +392,27 @@
provided the copyright notice and this notice are
preserved.
- Files: debian/patches/theme-diveintomark.patch
+ Files: ./debian/patches/theme-diveintomark.patch
Copyright: 2008, Joe Hacker <hack@example.org>
License: GPL-2+
[LICENSE TEXT]
- Files: planet/vendor/compat_logging/*
+ Files: ./planet/vendor/compat_logging/*
Copyright: 2002, Mark Smith <msmith@example.org>
License: MIT
[LICENSE TEXT]
- Files: planet/vendor/httplib2/*
+ Files: ./planet/vendor/httplib2/*
Copyright: 2006, John Brown <brown@example.org>
License:
Unspecified MIT style license.
- Files: planet/vendor/feedparser.py
+ Files: ./planet/vendor/feedparser.py
Copyright: 2007, Mike Smith <mike@example.org>
License: PSF-2
[LICENSE TEXT]
- Files: planet/vendor/htmltmpl.py
+ Files: ./planet/vendor/htmltmpl.py
Copyright: 2004, Thomas Brown <coder@example.org>
License: GPL-2+
On Debian systems the full text of the GNU General Public
Attachment:
signature.asc
Description: Digital signature