Re: distributed batch processing
> ----- Forwarded message from Paul Brossier <piem-lists@altern.org> -----
>
> Date: Tue, 10 May 2005 01:05:28 +0100
> From: Paul Brossier <piem-lists@altern.org>
> To: debian-devel@lists.debian.org
> Subject: distributed batch processing
> Reply-To: Paul Brossier <piem-lists@altern.org>
> Resent-Date: Mon, 9 May 2005 19:05:57 -0500 (CDT)
>
> Hi all,
>
> I am looking at ways to distribute batch jobs on various hosts.
> Essentially, i have N different command lines, and M different
> hosts to run them on:
...
>
> I had a try with 'queue' [1], but it seems rather obsolete now.
> I am now seeking recent alternatives. I went across a few
> solutions, such as DQS [2] (non-free, unmaintained), OpenPBS [3]
> (non-free), and distribulator [4] (looks interesting).
On Tue, May 10, 2005 at 04:33:29PM -0700, Dale Southard wrote:
>
> Last I checked, OpenPBS was still free as in beer and source
> was available. Wether it is free as in Stallman is another matter.
> It is still part of the OSCAR cluster solution
> (http://www.csm.ornl.gov/oscar/home.html)
>
> There are some others available as well:
>
> Sun's Grid Engine (http://gridengine.sunsource.net)
> which is a reincarnation of Codine, which was a reincarnation
> of DQS.
>
DQS/SGE/PBS/*NQS are all implementations of the standard qsub,qstat, etc
POSIX queueing systems. Debian really ought to have at least one packaged
implementation of this (DQS was, but I couldn't keep it working).
SGE isn't just a reincarnation, it's clearly a fork of the same
codebase. The build system is as horrible as ever, even though it was
replaced with PVM's. :) The memory leaks in DQS327 seem to have been
plugged, and the MPICH parallel environment is working well. I haven't
touched SGE 6.0 yet (what with all it's Java/Globus/SSL tie ins). SGE 5.3
is very much like DQS 3.2.7 (the last packaged Debian DQS). SGE still has
the original DQS fair-share scheduler, but it's no longer the default, a
FIFO scheduler is. I strongly suggest you switch back if you have more than
one user: qconf -msconf, user_sort=true IIRC).
An example wrapper to avoid having to set SGE_ROOT globally (be more
Debian-like, probably how I'd package it this time rather than hacking up
the source code):
#!/bin/sh
export SGE_ROOT=/dist/OS-SGEDIST/SGE53
ARCH=${SGE_ROOT}/util/arch
exec ${SGE_ROOT}/bin/${ARCH}/qsub $*
Lots of little quirks getting parallel environments and interactive jobs
to run, perhaps something for a wiki or howto.
-Drake
Reply to: