[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models



Hi,

On Sun, 2025-05-04 at 15:54 +0200, Matthias Urlichs wrote:
> On 04.05.25 15:44, Ansgar 🙀 wrote:
>   
> > What is not reproducible (in the reproducible build sense Debian uses)
> > about, say, the Tesseract OCR models?
> My point is that reproducing a model requires input data, which requires us to distribute said data, which requires them to be of suitable copyright.

Ah, you mean in the sense of a from-scratch rebuilding of statistical
data including the possibility to do different analysis?

Debian doesn't require all data for a from-scratch reimplementation for
packages to be available though. It would also run in many problems as
relevant documents (RFCs, ISO standards, design documents,
publications, ...) or cloned originals (UNIX or Windows APIs, games,
...) are often non-free.

This has so far also been the case for statistical data in Debian, such
as simple aggregates such as the number of packages in Debian, which
might be included in Debian without also including the entire Debian
archive as source, data about word or character frequencies in natural
language texts, and so on. I guess proponents of the original GR would
also find this problematic?

Ansgar


Reply to: