[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Aw: Re: Presentation of Debian Med on local Next Generation Sequencing workshop (April)



Hello,

> >    Beside the two big ones I already mentioned, Galaxy and CloudBioLinux, I
> >    don't know any other good example of a popular system for computational
> >    biology not using official packages. I also don't know of an example of a
> >    system that does. BioLinux, from where CloudBioLinux comes, is a close,
> >    but my experience is that the collaboration is not as close as you would
> >    expected.
> >    I have also used several institutional HPC clusters, some that are mainly
> >    used for computational biology and run on Debian or Ubuntu. None relied on
> >    oficial packages or even APT.
> 
> "run on Debian or Ubuntu" but not using packages or APT... oh well -- at
> least it keeps some useful admins on a payroll ;)  joking aside though,
> situation with "institutional" HPC clusters is indeed peculiar, since
> they need to address many users from different disciplines, and
> even if all the software would be provided by stock distribution, some
> users would still demand custom versions and/or builds.

Agreed, and there is a lot to optimise with better-than-gcc Intel compilers
and architecture-specific build flags. Compared with the effort to 
distribute the databases across the network and have applications run in
parallel on the data etc, the effort of a recompilation is rather neglectable.

> But when we would look at smaller deployments (labs/departments),
> starting off with stock distribution/packages could save lots of
> human power in the short and long run.  Then custom installations could
> still be done, using the same 'environment modules' system as they use
> across many HPCs.

Completely agreed.
 
> >    Finally, let me repeat myself. I think Debian Med is doing a great job. I
> >    hope more people could see the benefits of using the official packages. My
> >    limited involvement allowed me to see how much you can get by following
> >    good practices for packaging free software. My point was it seems there is
> >    a need to highlight the importance of these benefits in a meeting like the
> >    one mentioned in this thread. 
> 
> +1  ;)

Yip.

> >    Also a need for information on how to get
> >    reproducibility but still use the official packages.
> 
> actually that could be the easiest form to achieve kind of
> reproducibility: e.g.  if someone uses stock release of Debian, as I
> have demonstrated, it is very easy to recreate very similar (if
> not identical) environment in 1 (or few) commands (such as debootstrap).

I just have about 15 minutes, though, demonstrations (as mentioned in another)
are for the coffee breaks.

For the runtime environments and recreation of work environments, I see
docker.io together with snapshot.debian.org now helping Debian a lot,
across all Linux distributions. Olivier is chasing and forming that up
on the wiki.d.o .

Concerning the test environments and scientific merits, this is indeed a
problem. The pragmatic side that surrounds me here is mostly after avoiding
false positives in RNA sequencing. Anything coming up, is validated 
 * with a different technology (fight technical error, including the software)
 * with a larger pool of patients/controls (fight biological "error")
and if anyone sees me adding a test for a tool that is
published and used as it is, I would be asked if I should better work on
that grant application or smooth that paper or ... still, I agree, we
want unit tests and upstream cannot provide them all for the very same
reasons.

Brainstorming here: we'd need a platform to promote community contributions
to scientific Open Source software. Something tells me, that this could
be something like "us" (whatever "us" is), but in some less distribution
centric way. For Bioinformatics, the Open Bioinformatics Foundation comes
to mind. And, we need some way to reference such contribution. What just
jumped at me are a concept I learned about at last NETTAB in Venice:
Nano Publications (http://nanopub.org/). Those were meant for something
very different, but the concept should be adaptable, and I have not seen
them discussed in this social-scientific-security context. What do you
think?

Best,

Steffen


Reply to: