Bug#757551: lintian: check if DEP-5 debian/copyright covers all files in the unpacked sources
On 2014-08-10 10:05, Johannes Schauer wrote:
> Quoting Johannes Schauer (2014-08-10 10:02:31)
>> > that's very useful and I added this as a test case. The updated patch is
>> > attached.
> and it would help to attach the right file...
>
>
Hi,
Thanks for working on this and providing a patch. :)
I got a few questions to it though based on a very quick review. My
comments are interleaved in the patch.
> 0001-check-whether-the-dep-5-debian-copyright-wildcards-m.patch
>
>
> From 040fdbf1b96d1b6dcfa55c67c52e72b3a58ab8ce Mon Sep 17 00:00:00 2001
> From: Johannes Schauer <j.schauer@email.de>
> Date: Fri, 8 Aug 2014 12:00:39 +0200
> Subject: [PATCH] check whether the dep-5 debian/copyright wildcards match all
> files
>
> [...]
>
> diff --git a/checks/source-copyright.desc b/checks/source-copyright.desc
> index 921e7b9..b912b2a 100644
> --- a/checks/source-copyright.desc
> +++ b/checks/source-copyright.desc
> [...]
> index 6b9a0c5..cfdaef7 100644
> --- a/checks/source-copyright.pm
> +++ b/checks/source-copyright.pm
> @@ -24,6 +24,9 @@ use strict;
> use warnings;
> use autodie;
>
> +use Cwd qw(getcwd);
This does not seem to be used anywhere in the diff?
> +use File::Find qw();
> +
> use List::MoreUtils qw(any);
> use Text::Levenshtein qw(distance);
>
> @@ -252,6 +255,8 @@ sub _parse_dep5 {
> }
>
> my @commas_in_files;
> + my %file_coverage = map { $_ => 0 } get_all_files($info);
> + my %file_para_coverage = ();
Minor nit pick: Consider merging this with the @commas_in_files
declaration, a la:
my (@commas_in_files, %file_para_coverage);
Alternatively, move it up under @commas_in_files and remove the "= ();".
Then 5.20 should be able to this optimisation for us.
> my $i = 0;
> my $current_line = 0;
> for my $para (@dep5) {
> @@ -297,6 +302,31 @@ sub _parse_dep5 {
> @commas_in_files = ($i, $files_fname);
> }
>
> + # only attempt to evaluate globbing if commas could be legal
> + if (not @commas_in_files or any { m/,/xsm } $info->sorted_index) {
The "any { m/,/xsm } $info->sorted_index" part is an invariant in the
"for my $para (@dep5)" loop. I think we should move it out of the loop,
so we don't loop over all files in the index repeatedly just to conclude
that none of them have any commas.
> + my @wildcards = split /[\n\t ]+/, $files;
> + for my $wildcard (@wildcards) {
> + my ($regex, $wildcard_error) = wildcard_to_regex($wildcard);
> + if (defined $wildcard_error) {
> + tag 'invalid-escape-sequence-in-dep5-copyright', substr($wildcard_error, 0, 2) . " (paragraph at line $current_line)";
> + next;
> + }
> +
> + my $used = 0;
> + $file_para_coverage{$current_line} = 0;
> + for my $srcfile (keys %file_coverage) {
> + if ($srcfile =~ $regex) {
> + $used = 1;
> + $file_coverage{$srcfile} = $current_line;
> + $file_para_coverage{$current_line} = 1;
> + }
> + }
> + if (not $used) {
> + tag 'wildcard-matches-nothing-in-dep5-copyright', "$wildcard (paragraph at line $current_line)";
> + }
> + }
> + }
> +
> my ($found_license, $full_license, $short_license, @short_licenses)
> = parse_license($license,$current_line);
> if ($found_license) {
> @@ -325,12 +355,21 @@ sub _parse_dep5 {
> $current_line;
> }
> }
> - if (@commas_in_files) {
> + if (@commas_in_files and not any { m/,/xsm } $info->sorted_index) {
The "any {...} $info->sorted_index" here could be optimised away as well
if the change above is implemented.
> my ($paragraph_no, $field_name) = @commas_in_files;
> [...]
> }
> while ((my $license, $i) = each %required_standalone_licenses) {
> @@ -400,6 +439,55 @@ sub get_field {
> return;
> }
>
> +sub wildcard_to_regex {
> + my ($regex) = @_;
> + $regex =~ s,^\./+,,;
> + $regex =~ s,//+,/,g;
> + my $error;
> + eval {
> + $regex =~ s{
> + (\*) |
> + (\?) |
> + ([^*?\\]+) |
> + (\\[\\*?]) |
> + (.+)
> + }{
> + if (defined $1) {
> + '.*';
> + } elsif (defined $2) {
> + '.'
> + } elsif (defined $3) {
> + quotemeta($3);
> + } elsif (defined $4) {
> + $4;
Shouldn't this be quotemeta'ed as well? Otherwise "file.png" would
become eqv. to "file?png".
> + } else {
> + $error = $5;
> + die;
> + }
> + }egx;
> + };
> + if ($@) {
> + return (undef, $error);
> + } else {
> + return (qr/^(?:$regex)$/, undef);
> + }
> +}
> +
> [...]
Reply to: