Bug#757551: lintian: check if DEP-5 debian/copyright covers all files in the unpacked sources
tags 757551 patch
thanks
Hi,
Quoting Jakub Wilk (2014-08-09 23:21:54)
> I wrote it in 2012, so it likely no longer applies cleanly. I probably tested
> it only lightly, if at all. The tag descriptions are missing. There are no
> tests.
>
> I don't intend to work on improving the patch, but maybe somebody else
> will find it useful.
that would be me :)
Please find attached Jakub's patch rebased onto current Lintian and changed a
little here and there. In addition it includes tag description and tests.
Jakub: what I do not understand is what the rationale behind the
unused-file-paragraph-in-dep5-copyright tag is. I do not see a case where this
tag would be emitted but the wildcard-matches-nothing-in-dep5-copyright would
not.
cheers, josch
From 57fd9978cf072d408d9a9fb8d9f58b12802f59ea Mon Sep 17 00:00:00 2001
From: Johannes Schauer <j.schauer@email.de>
Date: Fri, 8 Aug 2014 12:00:39 +0200
Subject: [PATCH] check whether the dep-5 debian/copyright wildcards match all
files
- based on patch by Jakub Wilk - thanks!
---
checks/source-copyright.desc | 35 ++++++++
checks/source-copyright.pm | 98 ++++++++++++++++++++--
.../debian/debian/copyright | 24 ++++++
.../debian/file,with,commas | 0
.../debian/i-have-no-copyright-information | 0
t/tests/source-copyright-wildcard-matching/desc | 9 ++
t/tests/source-copyright-wildcard-matching/tags | 4 +
7 files changed, 165 insertions(+), 5 deletions(-)
create mode 100644 t/tests/source-copyright-wildcard-matching/debian/debian/copyright
create mode 100644 t/tests/source-copyright-wildcard-matching/debian/file,with,commas
create mode 100644 t/tests/source-copyright-wildcard-matching/debian/i-have-no-copyright-information
create mode 100644 t/tests/source-copyright-wildcard-matching/desc
create mode 100644 t/tests/source-copyright-wildcard-matching/tags
diff --git a/checks/source-copyright.desc b/checks/source-copyright.desc
index 921e7b9..b912b2a 100644
--- a/checks/source-copyright.desc
+++ b/checks/source-copyright.desc
@@ -205,3 +205,38 @@ Info: The paragraph has a "License" and a "Copyright" field, but no
Lintian will attempt to guess what you intended and continue based on
its guess. If the guess is wrong, you may see spurious tags related
to this paragraph.
+
+Tag: invalid-escape-sequence-in-dep5-copyright
+Severity: normal
+Certainty: possible
+Ref: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
+Info: The only allowed escape sequences are "\*", "\?" and "\\" (without
+ quotes) to produce a literal star, question mark and backslash, respectively.
+ Without the escaping backslash, the star and question mark take the role of
+ globbing operators similar to shell globs which is why they have to be
+ escaped. No other escapable characters than "*", "?" and "\" exist.
+
+Tag: wildcard-matches-nothing-in-dep5-copyright
+Severity: minor
+Certainty: possible
+Ref: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
+Info: The wildcard that was specified matches no file in the source tree.
+ This either indicates that you should fix the wildcard so that it matches
+ the intended file or that you can remove the wildcard. Notice that in
+ contrast to shell globs, the "*" (star or asterisk) matches slashes and
+ leading dots.
+
+Tag: file-without-copyright-information
+Severity: normal
+Certainty: possible
+Ref: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
+Info: The source tree contains a file which was not matched by any of
+ the <tt>Files</tt> paragraphs in debian/copyright. Either adjust existing
+ wildcards to match that file or add a new <tt>Files</tt> paragraph.
+
+Tag: unused-file-paragraph-in-dep5-copyright
+Severity: minor
+Certainty: possible
+Ref: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
+Info: The <tt>Files</tt> paragraph in debian/copyright is superfluous as it is
+ never used to match any files. You should be able to safely remove it.
diff --git a/checks/source-copyright.pm b/checks/source-copyright.pm
index 6b9a0c5..cfdaef7 100644
--- a/checks/source-copyright.pm
+++ b/checks/source-copyright.pm
@@ -24,6 +24,9 @@ use strict;
use warnings;
use autodie;
+use Cwd qw(getcwd);
+use File::Find qw();
+
use List::MoreUtils qw(any);
use Text::Levenshtein qw(distance);
@@ -252,6 +255,8 @@ sub _parse_dep5 {
}
my @commas_in_files;
+ my %file_coverage = map { $_ => 0 } get_all_files($info);
+ my %file_para_coverage = ();
my $i = 0;
my $current_line = 0;
for my $para (@dep5) {
@@ -297,6 +302,31 @@ sub _parse_dep5 {
@commas_in_files = ($i, $files_fname);
}
+ # only attempt to evaluate globbing if commas could be legal
+ if (not @commas_in_files or any { m/,/xsm } $info->sorted_index) {
+ my @wildcards = split /[\n\t ]+/, $files;
+ for my $wildcard (@wildcards) {
+ my ($regex, $wildcard_error) = wildcard_to_regex($wildcard);
+ if (defined $wildcard_error) {
+ tag 'invalid-escape-sequence-in-dep5-copyright', substr($wildcard_error, 0, 2) . " (paragraph at line $current_line)";
+ next;
+ }
+
+ my $used = 0;
+ $file_para_coverage{$current_line} = 0;
+ for my $srcfile (keys %file_coverage) {
+ if ($srcfile =~ $regex) {
+ $used = 1;
+ $file_coverage{$srcfile} = $current_line;
+ $file_para_coverage{$current_line} = 1;
+ }
+ }
+ if (not $used) {
+ tag 'wildcard-matches-nothing-in-dep5-copyright', "$wildcard (paragraph at line $current_line)";
+ }
+ }
+ }
+
my ($found_license, $full_license, $short_license, @short_licenses)
= parse_license($license,$current_line);
if ($found_license) {
@@ -325,12 +355,21 @@ sub _parse_dep5 {
$current_line;
}
}
- if (@commas_in_files) {
+ if (@commas_in_files and not any { m/,/xsm } $info->sorted_index) {
my ($paragraph_no, $field_name) = @commas_in_files;
- if (not any { m/,/xsm } $info->sorted_index) {
- tag 'comma-separated-files-in-dep5-copyright',
- 'paragraph at line',
- $lines[$paragraph_no]{$field_name};
+ tag 'comma-separated-files-in-dep5-copyright',
+ 'paragraph at line',
+ $lines[$paragraph_no]{$field_name};
+ } else {
+ foreach my $srcfile (sort keys %file_coverage) {
+ my $i = $file_coverage{$srcfile};
+ if (not $i) {
+ tag 'file-without-copyright-information', $srcfile;
+ }
+ delete $file_para_coverage{$i};
+ }
+ foreach my $i (sort keys %file_para_coverage) {
+ tag 'unused-file-paragraph-in-dep5-copyright', "paragraph at line $i";
}
}
while ((my $license, $i) = each %required_standalone_licenses) {
@@ -400,6 +439,55 @@ sub get_field {
return;
}
+sub wildcard_to_regex {
+ my ($regex) = @_;
+ $regex =~ s,^\./+,,;
+ $regex =~ s,//+,/,g;
+ my $error;
+ eval {
+ $regex =~ s{
+ (\*) |
+ (\?) |
+ ([^*?\\]+) |
+ (\\[\\*?]) |
+ (.+)
+ }{
+ if (defined $1) {
+ '.*';
+ } elsif (defined $2) {
+ '.'
+ } elsif (defined $3) {
+ quotemeta($3);
+ } elsif (defined $4) {
+ $4;
+ } else {
+ $error = $5;
+ die;
+ }
+ }egx;
+ };
+ if ($@) {
+ return (undef, $error);
+ } else {
+ return (qr/^(?:$regex)$/, undef);
+ }
+}
+
+sub get_all_files {
+ my ($info) = @_;
+ my @all_files = @{$info->{sorted_index}};
+ my $debfiles_root = $info->debfiles;
+ File::Find::find({
+ wanted => sub {
+ return unless -f $_;
+ my $dir = $File::Find::dir;
+ $dir =~ s,^\Q$debfiles_root\E(?:(?=/)|$),debian,;
+ push @all_files, "$dir/$_";
+ },
+ }, $debfiles_root);
+ return @all_files;
+}
+
1;
# Local Variables:
diff --git a/t/tests/source-copyright-wildcard-matching/debian/debian/copyright b/t/tests/source-copyright-wildcard-matching/debian/debian/copyright
new file mode 100644
index 0000000..cb26df9
--- /dev/null
+++ b/t/tests/source-copyright-wildcard-matching/debian/debian/copyright
@@ -0,0 +1,24 @@
+Format: http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
+Upstream-Name: Doohickey
+Upstream-Contact: J. Random Hacker <j.r.hacker@example.com>
+Source: http://examples.com/doohickey/source/
+
+Files: i-do-not-exist
+Copyright: 2014, somebody1
+License: mylicense
+ Fixme
+
+Files: invalid-escape\n
+Copyright: 2014, somebody1
+License: mylicense
+ Fixme
+
+Files: debian/*
+Copyright: 2014, somebody1
+License: mylicense
+ Fixme
+
+Files: file,with,commas
+Copyright: 2014, somebody1
+License: mylicense
+ Fixme
diff --git a/t/tests/source-copyright-wildcard-matching/debian/file,with,commas b/t/tests/source-copyright-wildcard-matching/debian/file,with,commas
new file mode 100644
index 0000000..e69de29
diff --git a/t/tests/source-copyright-wildcard-matching/debian/i-have-no-copyright-information b/t/tests/source-copyright-wildcard-matching/debian/i-have-no-copyright-information
new file mode 100644
index 0000000..e69de29
diff --git a/t/tests/source-copyright-wildcard-matching/desc b/t/tests/source-copyright-wildcard-matching/desc
new file mode 100644
index 0000000..979394e
--- /dev/null
+++ b/t/tests/source-copyright-wildcard-matching/desc
@@ -0,0 +1,9 @@
+Testname: source-copyright-wildcard-matching
+Sequence: 6000
+Version: 1.0
+Description: Test whether the Files wildcards match all files in the source
+Test-For:
+ invalid-escape-sequence-in-dep5-copyright
+ wildcard-matches-nothing-in-dep5-copyright
+ file-without-copyright-information
+ unused-file-paragraph-in-dep5-copyright
diff --git a/t/tests/source-copyright-wildcard-matching/tags b/t/tests/source-copyright-wildcard-matching/tags
new file mode 100644
index 0000000..760bdfd
--- /dev/null
+++ b/t/tests/source-copyright-wildcard-matching/tags
@@ -0,0 +1,4 @@
+I: source-copyright-wildcard-matching source: unused-file-paragraph-in-dep5-copyright paragraph at line 6
+I: source-copyright-wildcard-matching source: wildcard-matches-nothing-in-dep5-copyright i-do-not-exist (paragraph at line 6)
+W: source-copyright-wildcard-matching source: file-without-copyright-information i-have-no-copyright-information
+W: source-copyright-wildcard-matching source: invalid-escape-sequence-in-dep5-copyright \n (paragraph at line 11)
--
2.0.1
Reply to: