Re: ML-Policy and tesseract-ocr
On 2019-08-13 01:00, Paul Wise wrote:
> On Tue, Aug 13, 2019 at 8:45 AM Sam Hartman wrote:
>
>> In cases where the model is not recreated, but where software in Debian
>> could create the model, I think a README file is better than a package
>> relationship.
>
> Personally, I'd like to see Policy specify a standard mechanism that
> people could use to indicate to debian/rules that they want to
> automatically rebuild *everything* from source, not matter what the
> cost is.
This makes sense as the official policy doesn't cover non-standard
d/rules targets. How about this one:
* For a source package that produce binary package containing ML models,
it's encouraged to write a "reproduce-original-model" that may e.g.
download a dataset from internet and redo the same procedure to
produce the original model.
It's "encouraged" because some models involve complex manual steps
that a random developer can hardly understand, e.g.
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract
See my another mail sent out several minutes ago. In that mail
maintainers are required to provide necessary information about
the distributed ML model in README.Debian .
Reply to: