Re: Nextflow - have just used it on our HPC cluster and liked it
On 08/05/2023 07:45, Charles Plessy wrote:
[...] In bioconda
you have no idea whether sed is from GNU of from busybox unless you try
it or dig for a package recipe in GitHub...
Hi, Charles.
In fact, an Anaconda/Bioconda 'env' is little more than defined shell
environment variables and a .yaml recipe to install packages from the
Anaconda/Bioconda repo's. You can discover the version of e.g. "sed" you
are using in an active conda 'env' by:
which sed
If you look at the PATH in an active env you can see why:
printenv PATH
This also reveals more: An 'env' just overloads the existing environment
in your Linux shell and, consequently, unless you choose to install a
different version of a program in your 'env' your PATH still results in
the system-managed (deb) version of e.g. "sed" being run.
I run a full install of "med-bio" + Bioconda and create env's for odd
versions of Python, Perl, R and their supporting libraries that are
required by certain bioinformatics pipelines and that would otherwise
conflict with the system-managed (deb) versions if installed manually.
For me, this began with QIIME which failed it's validation tests using
the current, up-to-date, system-managed versions of supporting packages
when Tim Booth packaged it for Bio-Linux. I used Bioconda to teach a
course on QIIME, because Tim's Bio-Linux package gave different results
to running QIIME on a Mac using the same data, which was a serious
problem for my colleagues who wanted to compare their results.
As both you and Steffen have said many times, one aim of Debian-Med is
to promote good 'reproducible' research by providing a well-defined
environment in which to run bioinformatics pipelines. I believe the
combination of "med-bio" + Bioconda achieves that and I have ceased my
independent development of Bio-Linux if favour of creating a "bio-linux"
meta-package within the Debian-Med project with help from Andreas.
I am already doing something along the lines on our HPC cluster to turn
our packages into environment modules (lmod).
https://github.com/oist/BioinfoUgrp/blob/master/DebianMedModules.md#creation-of-a-new-singularity-image
The size of the images is a bit less than 8 GiB, and I make a new image
at each point release. Would there be some interest to make such images
in a more official way ?
I have to confess my deep ignorance of "singularity", but I am quite
interested. I created AWS and CyVerse Bio-Linux VM's a while ago and, I
guess, I should really bring myself up-to-date now with more modern
approaches for HPC. On that topic, has anyone tried out QLUSTAR since
Roland Ferrenbacher changed the licence to be 100% open source?
https://qlustar.com/
The real snag, for me, is that I can't be paid to install or support it!
However, anyone can install and use QLUSTAR themselves for free and I
can support their use of QLUSTAR for bioinformatics. I believe it was a
promising development when Roland Ferrenbacher agreed to support and
endorse Debian-Med in QLUSTAR, but I've not seen much interest that
development on our list despite Roland attending at least two Sprints.
Bye,
Tony.
--
Minke Informatics Limited, Registered in Scotland - Company No. SC419028
Registered Office: 3 Donview, Bridge of Alford, AB33 8QJ, Scotland (UK)
tel. +44(0)19755 63548 http://minke-informatics.co.uk
mob. +44(0)7985 078324 mailto:tony.travis@minke-informatics.co.uk
Reply to: