Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

To: Andreas Tille <andreas@an3as.eu>
Cc: debian-science@lists.debian.org
Subject: Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
From: Mattias Wadenstein <maswan@acc.umu.se>
Date: Tue, 21 May 2019 11:23:15 +0200 (CEST)
Message-id: <[🔎] alpine.DEB.2.21.1905211113560.119883@khan.acc.umu.se>
In-reply-to: <[🔎] 20190521090709.t4o3hsx4p665ws6w@an3as.eu>
References: <[🔎] f544829dcd6c0f92ea11cdb25543bdac@debian.org> <[🔎] 20190521090709.t4o3hsx4p665ws6w@an3as.eu>

On Tue, 21 May 2019, Andreas Tille wrote:

Quoting from your section "Questions Not Easy to Answer"


 1. Must the dataset for training a Free Model present in our archive?
    Wikipedia dump is a frequently used free dataset in the computational
    linguistics field, is uploading wikipedia dump to our Archive sane?

I have no idea about the size of this kind of dump.

The current size of wikimedia dumps is 18T, but that includes severalversions of data (five dated versions are shipped for most dumps), etc. Asa sample, I think this[1] is the english pages main text (not history ormetadata), which is 15G compressed.


1) https://ftp.acc.umu.se/mirror/wikimedia.org/dumps/enwiki/20190501/enwiki-20190501-pages-articles.xml.bz2

/Mattias Wadenstein, mirror admin who also mirrors the wikimedia dumps

Reply to:

References:
- Bits from /me: A humble draft policy on "deep learning v.s. freedom"
  - From: Mo Zhou <lumin@debian.org>
- Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
  - From: Andreas Tille <andreas@an3as.eu>

Prev by Date: Re: Bug#929296: libopenblas-base: is libopenblas.so needed?
Next by Date: Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
Previous by thread: Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
Next by thread: Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
Index(es):
- Date
- Thread