[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Several copyrights wrong in R packages



> On 2021-10-21 02:18, Charles Plessy wrote:
> > can routine-update grep the diff with the previous upstream release
> > using patterns such as license, author, copyright, (c), ©, etc, and emit
> > a warning if there is a match ?  This is what I used to do by hand when
> > I had no time or interest to inspect the whole diff for other changes.

Le Thu, Oct 21, 2021 at 11:15:39AM +0300, Andrius Merkys a écrit :
> 
> Maybe licensecheck could be of any use here?

Hi all,

I have a prototype.

Git allows an external diff command to be run, using the `GIT_EXTERNAL_DIFF`
environment variable.  Here is a quote from its manual page:

> GIT_EXTERNAL_DIFF
>
>    When the environment variable GIT_EXTERNAL_DIFF is set, the program
>    named by it is called to generate diffs, and Git does not use its
>    builtin diff machinery. For a path that is added, removed, or
>    modified, GIT_EXTERNAL_DIFF is called with 7 parameters:
>
>        path old-file old-hex old-mode new-file new-hex new-mode

So I wrote a small command that runs licensecheck on the old and new
file, and diffs the results.  I have to pipe them through sed because
at least one of the files is in a temporary folder, whose name is
included in the output of licensecheck.

    $ cat diff-licensecheck
    #!/bin/bash
    diff -u <(licensecheck $2 | sed s,$2,$1, ) <(licensecheck $5 | sed s,$5,$1,)
    exit 0

Then, I created a test git repository in which I commited one file
detected as under the Apache License, then replaced its contents by
something detected as under the BSD license the next commmmit.

And voilà

    $ GIT_EXTERNAL_DIFF=./diff-licensecheck git diff HEAD^
    --- /dev/fd/63  2021-10-21 21:35:27.071663271 +0900
    +++ /dev/fd/62  2021-10-21 21:35:27.071663271 +0900
    @@ -1 +1 @@
    -testfile: *No copyright* Apache License 2.0
    +testfile: BSD 3-clause "New" or "Revised" License

I then tested it on the `upstream` branch of one of our source packages
and... licensecheck is really too slow...

I tested the alternative ninka and it is much faster, but might have
more false positives.  For instance in r-cran-testthat, the NEWS.md
file pops up:

    git diff 5e69a2ce025417258572a9322e4b720f8ab389a2^ 5e69a2ce025417258572a9322e4b720f8ab389a2

    --- /dev/fd/63  2021-10-21 21:44:31.377404307 +0900
    +++ /dev/fd/62  2021-10-21 21:44:31.377404307 +0900
    @@ -1 +1 @@
    -NEWS.md;SeeFile,SeeFile,SeeFile;3;3;0;1;2;1,1,IntelPart08,UNKNOWN,UNKNOWN,1
    +NEWS.md;SeeFile,SeeFile;2;2;0;1;3;UNKNOWN,1,1,IntelPart08,UNKNOWN,UNKNOWN

Another thing is that new files or deleted files obviously pop up a
diff, but it must be easy to suppress that by not running the diff
command if either the old or the new file is empty or /dev/null.

That is all from me today, now it is time to sleep !

Cheers,

Charles

-- 
Charles Plessy                         Nagahama, Yomitan, Okinawa, Japan
Debian Med packaging team         http://www.debian.org/devel/debian-med
Tooting from work,           https://mastodon.technology/@charles_plessy
Tooting from home,                 https://framapiaf.org/@charles_plessy


Reply to: