[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ML-Policy and tesseract-ocr



Hi Marvin,

On 2019-08-12 18:35, Marvin Renich wrote:
> * Mo Zhou <lumin@debian.org> [190812 10:31]:
>> To this end, I wrote the policy #5 [3]:
>>
>>    A package that includes a machine learning model, must also include
>>    the corresponding training program, or depend on the package that
>> provides
>>    the corresponding training program.
>>
>> Does that make sense? If it looks good, then the solution
>> for this bug is already obvious enough.
> 
> Perhaps I am not interpreting what you are saying correctly, but I would
> say it is wrong.  The corresponding training program must be packaged in
> Debian, but it seems unlikely that there would be a binary package
> dependency from the model to the training program

The original "policy" was based on a rather strong restriction that
training script must be present when an ML model has been installed.
I meant "Depends" on the original text, but perhaps "Suggests" is better
than that since "Depends" may introduce circular dependency or the
arch-all-dep-on-arch-any problem.

That means "depend on ..." could be revised to "`Suggests:`"

> (result of running the training program with
> specific input data, if I understand correctly?) 

Yes, correct.

> The source package would need to Build-Depend on the training
> program and its inputs, but in general there would not need to be a
> normal Depends.

I see. The idea is that an ELF binary (ML model) doesn't have to
Depend on it's compiler (training program) and source (input data).
This makes sense to me and the "Suggest:" restriction may be better.

The "Suggest:" relationship implicitly hints the user about the
following questions:
1. what is the binary blob /usr/.../foobar.ml-model installed by the
   package foobar?
2. where did these digits come from?
3. how can I well understand how this model is created by the
   original author?
4. how do I obtain a similar model with my own dataset?
etc.

For most users I think they'll not try do actually dig into
the detail of the model, or even try to understand what it
is. So changing the model -> training script relationship
from "Depends" to "Suggest" could also avoid pulling the
whole stack of training software when installing the model.

> Perhaps you were just being sloppy about Build-Depends vs Depends, but
> when writing policy it is important to be very specific about that.

Thanks, I'll keep that in mind.


Reply to: