[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: packaging which supports multiple versions of an app on the same host



Dear Scott,

Thank you for your stimulating request. I feel reminded about some grid computing past of mine and dynamic runtime environments.
>From what I observe, though, those technologies never were adopted (beyond experimental stages) for heterogeneous setups like
those in bioinformatics. I am just a bit biased because I like those folks, just discuss internally if you think that
project-specific reference images of regular Debian/Ubuntu packages may be an answer and skim through
 * http://www.eucalyptus.com/open (a free Amazon clone with commercial backup)
 * http://snapshot.debian.org (any version of anything ever published on any Debian repository)
The two technologies together give you any constellation of tools you may desire. You can run any heterogeneous zoo of such
instances at the same time and let them communicate through some shared storage. All that load balancing for your services then
comes for (almost, you need to monitor and program it) free.

The above is beyond the core of Debian. You can mimic such with core Debian technologies. I'll keep going on the latter a bit
below, but am not fully convinced about it all myself. Above cloud technologies I think is what you want employ.

On 02/11/2012 05:50 PM, Scott Smith wrote:
> We are attempting version-specific packaging of bioinformatics software, such that multiple versions of the same app can exist on the same host.

This is already one possible answer to your problem of not knowing for sure about what version of a particular tool a particular
project may need. Works. If you need to use the same tool on the same data, or if in a larger non-separatable workflow you need to
use tool A of one particular package's version with tool B of the same package of another version, then this may be the way with
the least trouble.

I have indicated snapshot.debian.org as an apt-get-supporting repository for retrieving any version of any software any time.
Just, there is only one. But if you do not need more than one, then please spare the effort of modifying them. The default way in
Debian to have multiple versions installed in parallel is with chroot environments. This again is difficult for web services
because the running tools and TCP ports are not hidden from the host machine, but it is an excellent way to have an arbitrary
number of non-disturbed reliable instantly available project environments on a single machine. And compared with the size of the
data handled in today's projects, some 300MB for a chroot (i.e. a complete Debian installation performed in a subdirectory of
another Debian (or SuSE or RedHat or Ubuntu) installation) is not much.

> Has Debian-Med encountered the need for this sort of thing from others from the scientific community?

>From me. About Five years ago the answer was Dynamic Runtime Environments (see the Janitor component of the ARC grid middleware of
the http://www.NorduGrid.org). Today the answer may be "Cloud".

On another technical level, this request is typical for libraries and e.g. addressed by sonames for C/C++ and "versioned .jars"
for Java. For binaries this is typically avoided, considering the latest and greatest to be just sufficient.

> Do you have recommendations for how best to approach the problem in a way which will help us, and let us contribute back to you, most effectively?
We should find a way that reduces the changes you need to perform on the regular Debian/Ubuntu packages to a minimum, to null if
possible. My concern is not only the packaging work per se, but also the increased confidence in the functionality when one sees
all of popcon.debian.org to work with the exact same version.

> BACKGROUND:
> 
> The Genome Institute is attempting to shift to using formal packaging for internal software distribution across our compute cluster.  All of our blades run Ubuntu Lucid now,
This is the version of Ubuntu that BioLinux uses.

> so we have an interest in getting Debian packages for all of the popular bioinformatics tools we use, or helping to make them.
That is a big compliment. Thank you. Please send a list of the gaps you wish to be closed. Most of us have too many packages to
monitor and maintain already, so every helping hand is welcome.

> Our biggest difficulty is that our pipelines expect to produce reproducible results over time, and as such only call "versioned executables".   We currently custom compile everything, and install each version next-to each other.  The actual executable in the PATH is something like "cufflinks1.2.0", which prepends to the PATH where that code exists, and executes the "cufflinks" binary in that directory.
I think I would indeed go for chroots. You can tar them up and archive them. And you can loop-mount any directory into them when
needed to share data between them. For workflows across many such chroots, I suggest wrapper scripts that prepend the right chroot
command to $*.

> In many cases the apps we are packaging are apps which you have already packaged, and we only need to change things for the per-version packaging.  In other cases you have an older version of the code.  In others, you have not packaged it at all.  
For many apps, the versioning you are likely to find a way to automate. For HMMer there is an effort to keep the latest of version
2 in the distribution. So, for the larger biological jumps we are with you. But for the smaller ones I do not see Debian to follow
you for a versioning in the distribution, just to have the same results as a couple of years ago, which should then be inferior to
what you'd get today. But, sure, I can see the research and service aspect of it all.

> Our current trajectory is to hand-repackage whatever we depend on making the version number part of the package name, and formalizing the above shell-wrapper strategy.  For our first few attempts we use etc-alternaives to manage a symlink with the generic app name.  Upgrades will not happen as a user might expect, since each version is its own package, but we tentatively planned to make a meta-package with a name like "cufflinks-versioned" which depends on a regularly changing "cufflinksN.N.N" package, and would create the common user experience for anyone who did not request a specific version.
Are those packages then in conflict with each other? Or do you have everything separated into different directories? Or do you
have multiple binary versions sharing the data files of a later version?

> Thanks in advance for your advice,
I hope this mail to have helped. I just saw Yaroslav's reply when I was about to press the "send" button. This is now a bit
redundant, but, hey, you find a motif here.

Great to know you around,

Steffen




Reply to: