Re: Queuing System for Parallel Clusters
On Wed, Oct 18, 2000 at 03:56:08PM +0200, Thimo Neubauer wrote:
> we are running a 80PC cluster and we plan to use some sort of queuing
> system to run the parallel programs. What are the known free systems
> and what are your experiences?
> The second question is, if anyone on this list already used the
> SCore-system? Does it work with Debian? Is anyone packaging it?
I've tried various free systems. GNU queue looks nice in theory, but is
lacking in functionality. It also had a serious bug on Linux systems
which caused it to be removed from Debian 2.2.
NQS derivatives are all hopeless, IMHO. They perform host selection at
job submission time rather than job execution time, which is a really
broken idea. They're fine for controlling queues on a single large
parallel machine, but they're no good for managing clusters.
I bit the bullet and bought Platform Computing's LSF. It's got one or
two bugs, but you get what you pay for - the technical support is first
rate, and the feature list is great. It's particularly good if you want
to cycle-steal from workstations on peoples desks; you can configure
machines to be part of the cluster only at certain times of day, or only
when there are no users logged in, or several other load index measures.
You can even add in your own.
It handles heterogeneous networks beautifully; I run a mixed
Solaris/Linux cluster here. The base product does not support launching
MPI jobs sensibly, but there is an add-on module (LSF Parallel) to do
I realise some may think it's anathema to talk about commercial software
in a Debian mailing list, but I've bought this software for three
different groups of people, with widely varying requirements, over the
last three years or so and have never regretted it.