[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#757551: lintian: check if DEP-5 debian/copyright covers all files in the unpacked sources



Hi,

Quoting Niels Thykier (2014-08-10 10:39:48)
> > +use Cwd qw(getcwd);
> 
> This does not seem to be used anywhere in the diff?

correct. Removed.

> >      my @commas_in_files;
> > +    my %file_coverage = map { $_ => 0 } get_all_files($info);
> > +    my %file_para_coverage = ();
> 
> Minor nit pick: Consider merging this with the @commas_in_files
> declaration, a la:
> 
>   my (@commas_in_files, %file_para_coverage);

Done.

> > +            # only attempt to evaluate globbing if commas could be legal
> > +            if (not @commas_in_files or any { m/,/xsm } $info->sorted_index) {
> 
> The "any { m/,/xsm } $info->sorted_index" part is an invariant in the
> "for my $para (@dep5)" loop.  I think we should move it out of the loop,
> so we don't loop over all files in the index repeatedly just to conclude
> that none of them have any commas.

I calculated and stored the result of "any { m/,/xsm } $info->sorted_index"
outside of the loop...

> > -    if (@commas_in_files) {
> > +    if (@commas_in_files and not any { m/,/xsm } $info->sorted_index) {
> 
> The "any {...} $info->sorted_index" here could be optimised away as well
> if the change above is implemented.

... and used the result back here as well.

> > @@ -400,6 +439,55 @@ sub get_field {
> >      return;
> >  }
> >  
> > +sub wildcard_to_regex {
> > +    my ($regex) = @_;
> > +    $regex =~ s,^\./+,,;
> > +    $regex =~ s,//+,/,g;
> > +    my $error;
> > +    eval {
> > +        $regex =~ s{
> > +            (\*) |
> > +            (\?) |
> > +            ([^*?\\]+) |
> > +            (\\[\\*?]) |
> > +            (.+)
> > +        }{
> > +            if (defined $1) {
> > +                '.*';
> > +            } elsif (defined $2) {
> > +                '.'
> > +            } elsif (defined $3) {
> > +                quotemeta($3);
> > +            } elsif (defined $4) {
> > +                $4;
> 
> Shouldn't this be quotemeta'ed as well? Otherwise "file.png" would
> become eqv. to "file?png".

No, because the "." in file.png would match the third group and thus be
quotemeta'ed. The fourth grouph which is not quotemeta'ed matches the escaped
strings "\?", "\*" and "\\" which happen to be escaped the same way as a regex
and thus do not need any more quoting.

Thanks a lot for your review! The fixed version is attached :)

cheers, josch
From 26a4e2ffbd8ba13a038d33fe64630f4f44d30341 Mon Sep 17 00:00:00 2001
From: Johannes Schauer <j.schauer@email.de>
Date: Fri, 8 Aug 2014 12:00:39 +0200
Subject: [PATCH] check whether the dep-5 debian/copyright wildcards match all
 files

 - based on patch by Jakub Wilk - thanks!
---
 checks/source-copyright.desc                       | 35 ++++++++
 checks/source-copyright.pm                         | 99 ++++++++++++++++++++--
 .../debian/debian/copyright                        | 32 +++++++
 .../debian/file,with,commas                        |  0
 .../debian/i-have-no-copyright-information         |  0
 t/tests/source-copyright-wildcard-matching/desc    |  9 ++
 t/tests/source-copyright-wildcard-matching/tags    |  5 ++
 7 files changed, 174 insertions(+), 6 deletions(-)
 create mode 100644 t/tests/source-copyright-wildcard-matching/debian/debian/copyright
 create mode 100644 t/tests/source-copyright-wildcard-matching/debian/file,with,commas
 create mode 100644 t/tests/source-copyright-wildcard-matching/debian/i-have-no-copyright-information
 create mode 100644 t/tests/source-copyright-wildcard-matching/desc
 create mode 100644 t/tests/source-copyright-wildcard-matching/tags

diff --git a/checks/source-copyright.desc b/checks/source-copyright.desc
index 921e7b9..b912b2a 100644
--- a/checks/source-copyright.desc
+++ b/checks/source-copyright.desc
@@ -205,3 +205,38 @@ Info: The paragraph has a "License" and a "Copyright" field, but no
  Lintian will attempt to guess what you intended and continue based on
  its guess.  If the guess is wrong, you may see spurious tags related
  to this paragraph.
+
+Tag: invalid-escape-sequence-in-dep5-copyright
+Severity: normal
+Certainty: possible
+Ref: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
+Info: The only allowed escape sequences are "\*", "\?" and "\\" (without
+ quotes) to produce a literal star, question mark and backslash, respectively.
+ Without the escaping backslash, the star and question mark take the role of
+ globbing operators similar to shell globs which is why they have to be
+ escaped. No other escapable characters than "*", "?" and "\" exist.
+
+Tag: wildcard-matches-nothing-in-dep5-copyright
+Severity: minor
+Certainty: possible
+Ref: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
+Info: The wildcard that was specified matches no file in the source tree.
+ This either indicates that you should fix the wildcard so that it matches
+ the intended file or that you can remove the wildcard. Notice that in
+ contrast to shell globs, the "*" (star or asterisk) matches slashes and
+ leading dots.
+
+Tag: file-without-copyright-information
+Severity: normal
+Certainty: possible
+Ref: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
+Info: The source tree contains a file which was not matched by any of
+ the <tt>Files</tt> paragraphs in debian/copyright. Either adjust existing
+ wildcards to match that file or add a new <tt>Files</tt> paragraph.
+
+Tag: unused-file-paragraph-in-dep5-copyright
+Severity: minor
+Certainty: possible
+Ref: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
+Info: The <tt>Files</tt> paragraph in debian/copyright is superfluous as it is
+ never used to match any files. You should be able to safely remove it.
diff --git a/checks/source-copyright.pm b/checks/source-copyright.pm
index 6b9a0c5..a8d7a65 100644
--- a/checks/source-copyright.pm
+++ b/checks/source-copyright.pm
@@ -24,6 +24,8 @@ use strict;
 use warnings;
 use autodie;
 
+use File::Find qw();
+
 use List::MoreUtils qw(any);
 use Text::Levenshtein qw(distance);
 
@@ -251,9 +253,11 @@ sub _parse_dep5 {
         $short_licenses_seen{$short_license} = 1;
     }
 
-    my @commas_in_files;
+    my (@commas_in_files, %file_para_coverage);
+    my %file_coverage = map { $_ => 0 } get_all_files($info);
     my $i = 0;
     my $current_line = 0;
+    my $commas_in_files = any { m/,/xsm } $info->sorted_index;
     for my $para (@dep5) {
         $i++;
         $current_line = $lines[$i]{'START-OF-PARAGRAPH'};
@@ -297,6 +301,31 @@ sub _parse_dep5 {
                 @commas_in_files = ($i, $files_fname);
             }
 
+            # only attempt to evaluate globbing if commas could be legal
+            if (not @commas_in_files or $commas_in_files) {
+                my @wildcards = split /[\n\t ]+/, $files;
+                for my $wildcard (@wildcards) {
+                    my ($regex, $wildcard_error) = wildcard_to_regex($wildcard);
+                    if (defined $wildcard_error) {
+                        tag 'invalid-escape-sequence-in-dep5-copyright', substr($wildcard_error, 0, 2) . " (paragraph at line $current_line)";
+                        next;
+                    }
+
+                    my $used = 0;
+                    $file_para_coverage{$current_line} = 0;
+                    for my $srcfile (keys %file_coverage) {
+                        if ($srcfile =~ $regex) {
+                            $used = 1;
+                            $file_coverage{$srcfile} = $current_line;
+                            $file_para_coverage{$current_line} = 1;
+                        }
+                    }
+                    if (not $used) {
+                        tag 'wildcard-matches-nothing-in-dep5-copyright', "$wildcard (paragraph at line $current_line)";
+                    }
+                }
+            }
+
             my ($found_license, $full_license, $short_license, @short_licenses)
               = parse_license($license,$current_line);
             if ($found_license) {
@@ -325,12 +354,21 @@ sub _parse_dep5 {
               $current_line;
         }
     }
-    if (@commas_in_files) {
+    if (@commas_in_files and not $commas_in_files) {
         my ($paragraph_no, $field_name) = @commas_in_files;
-        if (not any { m/,/xsm } $info->sorted_index) {
-            tag 'comma-separated-files-in-dep5-copyright',
-              'paragraph at line',
-              $lines[$paragraph_no]{$field_name};
+        tag 'comma-separated-files-in-dep5-copyright',
+          'paragraph at line',
+          $lines[$paragraph_no]{$field_name};
+    } else {
+        foreach my $srcfile (sort keys %file_coverage) {
+            my $i = $file_coverage{$srcfile};
+            if (not $i) {
+                tag 'file-without-copyright-information', $srcfile;
+            }
+            delete $file_para_coverage{$i};
+        }
+        foreach my $i (sort keys %file_para_coverage) {
+            tag 'unused-file-paragraph-in-dep5-copyright', "paragraph at line $i";
         }
     }
     while ((my $license, $i) = each %required_standalone_licenses) {
@@ -400,6 +438,55 @@ sub get_field {
     return;
 }
 
+sub wildcard_to_regex {
+    my ($regex) = @_;
+    $regex =~ s,^\./+,,;
+    $regex =~ s,//+,/,g;
+    my $error;
+    eval {
+        $regex =~ s{
+            (\*) |
+            (\?) |
+            ([^*?\\]+) |
+            (\\[\\*?]) |
+            (.+)
+        }{
+            if (defined $1) {
+                '.*';
+            } elsif (defined $2) {
+                '.'
+            } elsif (defined $3) {
+                quotemeta($3);
+            } elsif (defined $4) {
+                $4;
+            } else {
+                $error = $5;
+                die;
+            }
+        }egx;
+    };
+    if ($@) {
+        return (undef, $error);
+    } else {
+        return (qr/^(?:$regex)$/, undef);
+    }
+}
+
+sub get_all_files {
+    my ($info) = @_;
+    my @all_files = @{$info->{sorted_index}};
+    my $debfiles_root = $info->debfiles;
+    File::Find::find({
+        wanted => sub {
+            return unless -f $_;
+            my $dir = $File::Find::dir;
+            $dir =~ s,^\Q$debfiles_root\E(?:(?=/)|$),debian,;
+            push @all_files, "$dir/$_";
+        },
+    }, $debfiles_root);
+    return @all_files;
+}
+
 1;
 
 # Local Variables:
diff --git a/t/tests/source-copyright-wildcard-matching/debian/debian/copyright b/t/tests/source-copyright-wildcard-matching/debian/debian/copyright
new file mode 100644
index 0000000..4e09d5a
--- /dev/null
+++ b/t/tests/source-copyright-wildcard-matching/debian/debian/copyright
@@ -0,0 +1,32 @@
+Format: http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
+Upstream-Name: Doohickey
+Upstream-Contact: J. Random Hacker <j.r.hacker@example.com>
+Source: http://examples.com/doohickey/source/
+
+Files: file?with?commas
+       deb*
+Copyright: 2014, somebody1
+Comment: this paragraph is superfluous because all files are matched
+ by the following paragraphs
+License: mylicense
+ Fixme
+
+Files: i-do-not-exist
+Copyright: 2014, somebody1
+License: mylicense
+ Fixme
+
+Files: invalid-escape\n
+Copyright: 2014, somebody1
+License: mylicense
+ Fixme
+
+Files: debian/*
+Copyright: 2014, somebody1
+License: mylicense
+ Fixme
+
+Files: file,with,commas
+Copyright: 2014, somebody1
+License: mylicense
+ Fixme
diff --git a/t/tests/source-copyright-wildcard-matching/debian/file,with,commas b/t/tests/source-copyright-wildcard-matching/debian/file,with,commas
new file mode 100644
index 0000000..e69de29
diff --git a/t/tests/source-copyright-wildcard-matching/debian/i-have-no-copyright-information b/t/tests/source-copyright-wildcard-matching/debian/i-have-no-copyright-information
new file mode 100644
index 0000000..e69de29
diff --git a/t/tests/source-copyright-wildcard-matching/desc b/t/tests/source-copyright-wildcard-matching/desc
new file mode 100644
index 0000000..979394e
--- /dev/null
+++ b/t/tests/source-copyright-wildcard-matching/desc
@@ -0,0 +1,9 @@
+Testname: source-copyright-wildcard-matching
+Sequence: 6000
+Version: 1.0
+Description: Test whether the Files wildcards match all files in the source
+Test-For:
+ invalid-escape-sequence-in-dep5-copyright
+ wildcard-matches-nothing-in-dep5-copyright
+ file-without-copyright-information
+ unused-file-paragraph-in-dep5-copyright
diff --git a/t/tests/source-copyright-wildcard-matching/tags b/t/tests/source-copyright-wildcard-matching/tags
new file mode 100644
index 0000000..14b5505
--- /dev/null
+++ b/t/tests/source-copyright-wildcard-matching/tags
@@ -0,0 +1,5 @@
+I: source-copyright-wildcard-matching source: unused-file-paragraph-in-dep5-copyright paragraph at line 14
+I: source-copyright-wildcard-matching source: unused-file-paragraph-in-dep5-copyright paragraph at line 6
+I: source-copyright-wildcard-matching source: wildcard-matches-nothing-in-dep5-copyright i-do-not-exist (paragraph at line 14)
+W: source-copyright-wildcard-matching source: file-without-copyright-information i-have-no-copyright-information
+W: source-copyright-wildcard-matching source: invalid-escape-sequence-in-dep5-copyright \n (paragraph at line 19)
-- 
2.0.1


Reply to: