[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#747248: cannot find a satisfying regex for this



Can this bug even be fixed in practice? I thought a bit about this bug and also
investigated the list of copyright names supplied by Clint Adams (thank you,
very useful!). I ended up with the impression that besides things like
whitespaces in the license name (bug #757615), pipe symbol instead of 'or' (bug
#757583) or wrong usage of separating commas (bug #757579) there is not much
that can be done.

I was only able to come up with very few additions to
data/source-copyright/bad-short-licenses with questionable utility:

# some licenses are misspelled by not putting a dash in front of the version
^(?:agpl|gpl|lgpl)[^-]?[123](?:\.\d)?\+?$ ~~ license-problem-invalid-short-name
# some misspellings of BSD licenses
^bsd$                                     ~~ license-problem-invalid-short-name
^bsd[^-]?[234][^-]?clause$                ~~ license-problem-invalid-short-name

What else can one avoid? The license names are pretty much free-form and the
spec does not give any advice how to say "license X plus this exception" or
"license Y in this variant" in a common format which seems to be the two most
used reasons to not use the common license names. Ideas?

And some more invalid license names:

\b-\b             ~~ license-problem-undefined-license
\bother\b         ~~ license-problem-undefined-license
\bunspecified\b   ~~ license-problem-undefined-license

But all of the above only cover a very small set of the wrong license names I
found in Clint's list. Would it be possible to obtain Clint's list with numbers
how often each value is in use? How would one obtain that data? Maybe I can
then come up with more useful regexes to fix this bug.

cheers, josch


Reply to: