[RFC] License text parser and converter
DEP-5 is a good step forward to standardize copyright information.
While it's indeed useful to determine a license type, there's no
guarantee stand-alone license paragraphs or license headers are
accurate (i.e. typos, GPL-2 license header while declaring GPL-2+), or
canonically formatted (i.e. 72-char lines vs. 80-char lines, different
spacing or indentation).
Plus, some license text are not expressed in RFC822 syntax, and
maintainers must adapt it to fit into DEP-5. As a reference, take MPL
license, which is very long and fullfilled with spacing, converting it
by hand is definitely time-consuming and error prone.
I'd like to ask the following questions:
* Is there a tool which, given a license text in raw format, converts
it to match RFC822 syntax?
* In case it exists, is it able to perform sanity checks on the
licenses (i.e. license text matches intended one)?
* In case it doesn't exist yet, would it be worth implementing it?
* Which language would you use to implement it?
I guess the answer to the last question would be Perl, as several
related tools use Perl already (think of lintian, or
libconfig-model-perl), but I suspect other scripting languages could
fit as well.