[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Nextflow - have just used it on our HPC cluster and liked it



On 08/05/2023 07:45, Charles Plessy wrote:
[...]  In bioconda
you have no idea whether sed is from GNU of from busybox unless you try
it or dig for a package recipe in GitHub...

Hi, Charles.

In fact, an Anaconda/Bioconda 'env' is little more than defined shell environment variables and a .yaml recipe to install packages from the Anaconda/Bioconda repo's. You can discover the version of e.g. "sed" you are using in an active conda 'env' by:

  which sed

If you look at the PATH in an active env you can see why:

  printenv PATH

This also reveals more: An 'env' just overloads the existing environment in your Linux shell and, consequently, unless you choose to install a different version of a program in your 'env' your PATH still results in the system-managed (deb) version of e.g. "sed" being run.

I run a full install of "med-bio" + Bioconda and create env's for odd versions of Python, Perl, R and their supporting libraries that are required by certain bioinformatics pipelines and that would otherwise conflict with the system-managed (deb) versions if installed manually.

For me, this began with QIIME which failed it's validation tests using the current, up-to-date, system-managed versions of supporting packages when Tim Booth packaged it for Bio-Linux. I used Bioconda to teach a course on QIIME, because Tim's Bio-Linux package gave different results to running QIIME on a Mac using the same data, which was a serious problem for my colleagues who wanted to compare their results.

As both you and Steffen have said many times, one aim of Debian-Med is to promote good 'reproducible' research by providing a well-defined environment in which to run bioinformatics pipelines. I believe the combination of "med-bio" + Bioconda achieves that and I have ceased my independent development of Bio-Linux if favour of creating a "bio-linux" meta-package within the Debian-Med project with help from Andreas.


I am already doing something along the lines on our HPC cluster to turn
our packages into environment modules (lmod).

https://github.com/oist/BioinfoUgrp/blob/master/DebianMedModules.md#creation-of-a-new-singularity-image

The size of the images is a bit less than 8 GiB, and I make a new image
at each point release.  Would there be some interest to make such images
in a more official way ?

I have to confess my deep ignorance of "singularity", but I am quite interested. I created AWS and CyVerse Bio-Linux VM's a while ago and, I guess, I should really bring myself up-to-date now with more modern approaches for HPC. On that topic, has anyone tried out QLUSTAR since Roland Ferrenbacher changed the licence to be 100% open source?

https://qlustar.com/

The real snag, for me, is that I can't be paid to install or support it!

However, anyone can install and use QLUSTAR themselves for free and I can support their use of QLUSTAR for bioinformatics. I believe it was a promising development when Roland Ferrenbacher agreed to support and endorse Debian-Med in QLUSTAR, but I've not seen much interest that development on our list despite Roland attending at least two Sprints.

Bye,

  Tony.

--
Minke Informatics Limited, Registered in Scotland - Company No. SC419028
Registered Office: 3 Donview, Bridge of Alford, AB33 8QJ, Scotland (UK)
tel. +44(0)19755 63548                    http://minke-informatics.co.uk
mob. +44(0)7985 078324        mailto:tony.travis@minke-informatics.co.uk


Reply to: