Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

To: Andreas Tille <andreas@an3as.eu>
Cc: debian-science@lists.debian.org
Subject: Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
From: Mo Zhou <lumin@debian.org>
Date: Tue, 21 May 2019 04:45:32 -0700
Message-id: <[🔎] 0840f66e23d6445100369e1fbae847af@debian.org>
In-reply-to: <[🔎] 20190521090709.t4o3hsx4p665ws6w@an3as.eu>
References: <[🔎] f544829dcd6c0f92ea11cdb25543bdac@debian.org> <[🔎] 20190521090709.t4o3hsx4p665ws6w@an3as.eu>

Hi Andreas,

On 2019-05-21 09:07, Andreas Tille wrote:
> Not sure whether this is sensible to be added to the issue
> tracker.

I always abuse issue track in my personal repository.

> Quoting from your section "Questions Not Easy to Answer"
> 
> 
>   1. Must the dataset for training a Free Model present in our archive?
>      Wikipedia dump is a frequently used free dataset in the computational
>      linguistics field, is uploading wikipedia dump to our Archive sane?
> 
> I have no idea about the size of this kind of dump.  Recently I've read
> that data sets for other programs tend into the direction of 1GB.  In
> Debian Med I'm maintaining metaphlan2-data with 204MB which would be
> even larger if there would not be some method for "data reduction" would
> be used that is considered a bug (#839925) by other DDs.

As pointed out by Mattias Wadenstein (thanks for the data point), the
wikipedia dump is large enough to challenge the .deb format (recent
threads).

>   2. Should we re-train the Free Models on buildd? This is crazy. Let's
>      don't do that right now.
> 
> If you ask me bothering buildd with this task is insane.  However I'm
> positively convinced that we should ship the training data and be able
> to train the models from these.

It's always good if we can do these things purely with our archive.
However sometimes it's just not easy to enforce: datasets used by DL
are generally large, (several hundred MB ~ several TB or even larger).

Reply to:

Follow-Ups:
- Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
  - From: julien.puydt@laposte.net

References:
- Bits from /me: A humble draft policy on "deep learning v.s. freedom"
  - From: Mo Zhou <lumin@debian.org>
- Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
  - From: Andreas Tille <andreas@an3as.eu>

Prev by Date: Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
Next by Date: Re: Bug#929296: libopenblas-base: is libopenblas.so needed?
Previous by thread: Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
Next by thread: Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
Index(es):
- Date
- Thread