Re: Datasets downloaded by scikit-learn as separate packages?
On Mon, 2021-09-20 at 19:52 +0200, Christian Kastner wrote:
> >
> > Or should we not build these jupyter notebooks for the -doc package?
>
> I don't think anyone would stop you from packaging the datasets but to
> be honest, I think that would be overkill. The -doc package has a
> popcon
> of 93, and I would assume that (like me) most users of scikit-learn use
> upstream's online documentation directly.
Many machine learning-related packages require external datasets,
and the upstream usually provide APIs for the users to automatically
download them if they are really useful for a large number of audience.
I vote for "packaging a dataset is not necessary", and we may use
pytest marker to skip the tests requiring external data.
I refrained from uploading any datasets except for
$ apt list dataset\*
Listing... Done
dataset-fashion-mnist/unstable,unstable,now
as it can be used as a universal sanity test dataset for any machine
learning tool sanity test dataset. (in academics, people use the
dataset named MNIST. the above Fashion-MNIST is an MIT-licensed
alternative).
Reply to: