[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Integrating Machine Learning Software and Datasets withing debian



On Sat, 2010-06-12 at 14:34 +0200, Christian Kastner wrote:
> (apologies if quoting is broken, I wasn't subscribed yet at that point,
> and the archives are only available via the web interface)
> 
> On Thu, 10 Jun 2010, Yaroslav Halchenko wrote:
> >On Thu, 10 Jun 2010, Soeren Sonnenburg wrote:
> >> we have been setting up freshmeat like repositories for machine learning
> >> open source software ( http://mloss.org ) and data sets
> >> ( http://mldata.org ) trying to make open source software/open data more
> >> widely known within the machine learning community (also organizing
> >> workshops and establishing in the ``biggest'' machine learning journal
> >> http://jmlr.org).
> >>
> >>
> >> I would wish to improve integration between the above repositories and
> >> debian. So my question: Who is interested in packaging machine learning
> >> packages?
> >
> > hm...  I guess at least
> > you: shogun, weka
> > me: mvpa, scikit-learn, mlpy, vowpal-wabbit

Yaroslav, don't you also package lush? That is also core-ml... And well
opencv is also partially

> I would very much like to contribute here. As I am not a DD, I would
> greatly appreciate sponsoring :-) I am very grateful for the work
> Yaroslav already did as a mentor/sponsor for libfann.

I would be happy to sponsor any ml related packages ...

> The question is: where would I start packaging? Looking at mloss.org, I
> can see it contains hundreds of registered projects. Apart from the
> obvious (packages with a personal interest), this would require some
> form of strategy.
> 
> There would be many factors to consider. Relevance of the project is of
> course important. I do not see much merit in (for example) providing
> every implementation or variant of X out there; in fact, I think that
> would even hurt more than help. Diversity is good, but too much of it
> would IMHO only lead to fragmentation and confusion.
> 
> Another factor to consider would be upstream activity. When updating
> libfann, for example, it was obvious that upstream development ceased
> around 2007. I only updated that package because it apparently still is
> quite popular I couldn't find a suitable alternative.
> 
> > Since maintainer group requires people,
> > atm it might be worth starting with a wiki page pointing to the blends
> > task, and trying to formalize the longer standing goal for possible
> > maintainer group (otherwise, without clear advantages, co-existence
> > within debian-science seems to be logical way forward)
> 
> A formalized goal would help new contributors such as me very much by
> pointing them in a general direction.

Well, I could of course name a couple of projects that I consider very
relevant and that are likely to be maintained.

We could also add 'I use this' buttons to mloss.org to potentially
figure out projects that are used more often.

On the other hand just looking at the download/views statistics might
reveal a number of good projects and the ones published in JMLR at least
underwent peer review - so should be of higher quality.

Of course this won't help to predict future activity.

> > In addition, having heard Andrea Tille's talk about debian blends
> >  http://blends.alioth.debian.org/science/tasks/machine-learning
> > I wonder if we could have pointers from mloss.org to the respective
> > debian packages/wnpp's etc.
> >
> > In the long term, it would have been great if such inter-referencing
> > between software portals, Debian blens, and Debian could become more
> > pronounced at debian.org proper (e.g. packages listing etc).
> 
> Those ideas sound great!

I guess we first need more ML packages - then this interaction makes
sense.

I would suggest to start packaging some simple but widely used things
first, like:

MPIKmeans
liblinear and/or libocas

Soeren
-- 
For the one fact about the future of which we can be certain is that it
will be utterly fantastic. -- Arthur C. Clarke, 1962

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: