[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: licensecheck and debian/copyright



Le Thu, Dec 10, 2009 at 01:56:20AM +0000, Dmitrijs Ledkovs a écrit :
> 
> There isn't DEB-5 debian/copyright parser available. So this cannot be
> implemented in licensecheck yet.

Dear Dmitrijs,

Jon Dowland has published an example parser on this list
(http://lists.debian.org/msgid-search/20090913225846.GB16109@tchicaya.lan).
However, it is written in Python and is therefore of a little help for
licensecheck, written in Perl.

On my side, I have started to work on a parser for the relaxed syntax I propose
on my exprimental git branch of the DEP
(http://git.debian.org/?p=users/plessy/license-summary.git;a=blob_plain;f=dep5.mdwn).

In that case, it is as simple as:

 - Process paragraphs – separated by an empty line – one by one.
 - Collapse paragraphs in a hash where keys are field names, ignoring
   paragraphs that do not contain fields.

This results in an array of hashes, or in YAML dialect, a sequence of mappings.

$/ = undef;
my @paragraphs = split (/\n\n/, <>);   # Split on empty lines
my @parsed;
my $counter = 0;

foreach my $paragraph (@paragraphs) {
    if (my $collapsed = collapse($paragraph)) {     # Collapse each paragraph in a hash
        $parsed[$counter++] = $collapsed;
    }
}

sub collapse {
    my $paragraph = shift;
    my %hash;
    my $current_field = 0;                    # Next line may still be part of the field content.
    my @lines = split (/\n/, $paragraph);
    foreach (@lines) {
        if ( /^(\w+)\s*:\s*(.*)$/ ) {      # New fields terminate the previous one.
            $current_field = $1;
            $hash{$1} .= "$2";
        } elsif ( /^\s(.*)$/ ) {
            $hash{$current_field} .= "\n$1" if $current_field;
        } else {
            $current_field = 0;     # Lack of indentation also terminate the field.
        }
    }
    return \%hash if keys(%hash);
}

The above script still has bugs, but I hope it summarises how easy it could be
to write a parser if the DEP is constructed with this as a goal.


I originally proposed a syntax that is not the same as Debian control files,
but currently I am still dissatisfied even by my proposition. With whichever
format, it is easy to break the syntax, in particular by forgetting white space
for indentation, or the ‘space-dot’ escape sequence for the empty lines in the
‘Debian control’ syntax. From my frustrating experience when adding by hand the
contents of the artistic v2.0 license to the debian/copyright file from one of
the packages I maintain, I concluded that it can significantly impair the
adoption of DEP-5. So on this list or elsewhere, I think that there is still
some experimentation and concertation to do.

Have a nice day,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan


Reply to: