[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

git-copyright-scan: find authors missing from DEP-5 debian/copyright



Hi,

while updating debian/copyright of a package to match a new upstream
version I noticed that upstream is already generating AUTHORS and
license headers from git log. It felt odd to parse these headers back to
debian/copyright completely manually.

Could we compare git log against debian/copyright? git-copyright-scan is
a proof-of-concept that tries to locate authors that are listed in git
log but are missing from DEP-5 debian/copyright. I ran

git-copyright-scan --git-opt --before=2010-01-01 --min-commits 20 \
--min-lines 20

against all source packages that have Vcs-Git field and use DEP-5 and
got the following list of potentially forgotten authors (BEWARE: that
has a lot of false positives due to issue 3) below)

http://lindi.iki.fi/lindi/dep5/scan6.txt

I ignored authors who have done only minor contributions and also very
new authors (since debian/copyright might be slightly out of date which
I guess is the trend...)

I hit the following issues:

1) Not everyone uses UTF-8 in git log. Fuzzing matching should help here.

2) DEP-5 does not specify the format of Copyright: lines, only that each
copyright holder should be on its own line. It would be nice if at least
the simplest cases used a canonical "Copyright X, Y, Z Foo Bar
<foo@bar.example.com>" format.

3) Vcs-Git often points to a Vcs that does not have upstream commit
history. Could we consider something like "Vcs-Upstream-Git" in
debian/control? The non-"debian/*" hits of the above scan are meaningful
only for packages that have upstream history in the Vcs-Git.

Finally, if you are not afraid of hacky python code the sources are
available in

http://iki.fi/lindi/git/git-copyright-scan.git/

-Timo


Reply to: