Re: licensecheck and debian/copyright
Le Thu, Dec 10, 2009 at 01:56:20AM +0000, Dmitrijs Ledkovs a écrit :
>
> There isn't DEB-5 debian/copyright parser available. So this cannot be
> implemented in licensecheck yet.
Dear Dmitrijs,
Jon Dowland has published an example parser on this list
(http://lists.debian.org/msgid-search/20090913225846.GB16109@tchicaya.lan).
However, it is written in Python and is therefore of a little help for
licensecheck, written in Perl.
On my side, I have started to work on a parser for the relaxed syntax I propose
on my exprimental git branch of the DEP
(http://git.debian.org/?p=users/plessy/license-summary.git;a=blob_plain;f=dep5.mdwn).
In that case, it is as simple as:
- Process paragraphs – separated by an empty line – one by one.
- Collapse paragraphs in a hash where keys are field names, ignoring
paragraphs that do not contain fields.
This results in an array of hashes, or in YAML dialect, a sequence of mappings.
$/ = undef;
my @paragraphs = split (/\n\n/, <>); # Split on empty lines
my @parsed;
my $counter = 0;
foreach my $paragraph (@paragraphs) {
if (my $collapsed = collapse($paragraph)) { # Collapse each paragraph in a hash
$parsed[$counter++] = $collapsed;
}
}
sub collapse {
my $paragraph = shift;
my %hash;
my $current_field = 0; # Next line may still be part of the field content.
my @lines = split (/\n/, $paragraph);
foreach (@lines) {
if ( /^(\w+)\s*:\s*(.*)$/ ) { # New fields terminate the previous one.
$current_field = $1;
$hash{$1} .= "$2";
} elsif ( /^\s(.*)$/ ) {
$hash{$current_field} .= "\n$1" if $current_field;
} else {
$current_field = 0; # Lack of indentation also terminate the field.
}
}
return \%hash if keys(%hash);
}
The above script still has bugs, but I hope it summarises how easy it could be
to write a parser if the DEP is constructed with this as a goal.
I originally proposed a syntax that is not the same as Debian control files,
but currently I am still dissatisfied even by my proposition. With whichever
format, it is easy to break the syntax, in particular by forgetting white space
for indentation, or the ‘space-dot’ escape sequence for the empty lines in the
‘Debian control’ syntax. From my frustrating experience when adding by hand the
contents of the artistic v2.0 license to the debian/copyright file from one of
the packages I maintain, I concluded that it can significantly impair the
adoption of DEP-5. So on this list or elsewhere, I think that there is still
some experimentation and concertation to do.
Have a nice day,
--
Charles Plessy
Tsurumi, Kanagawa, Japan
Reply to: