[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: distributed batch processing

> ----- Forwarded message from Paul Brossier <piem-lists@altern.org> -----
> Date: Tue, 10 May 2005 01:05:28 +0100
> From: Paul Brossier <piem-lists@altern.org>
> To: debian-devel@lists.debian.org
> Subject: distributed batch processing
> Reply-To: Paul Brossier <piem-lists@altern.org>
> Resent-Date: Mon,  9 May 2005 19:05:57 -0500 (CDT)
> Hi all,
> I am looking at ways to distribute batch jobs on various hosts.
> Essentially, i have N different command lines, and M different
> hosts to run them on:
> I had a try with 'queue' [1], but it seems rather obsolete now.
> I am now seeking recent alternatives. I went across a few
> solutions, such as DQS [2] (non-free, unmaintained), OpenPBS [3]
> (non-free), and distribulator [4] (looks interesting).

On Tue, May 10, 2005 at 04:33:29PM -0700, Dale Southard wrote:
> Last I checked, OpenPBS was still free as in beer and source
> was available.  Wether it is free as in Stallman is another matter.
> It is still part of the OSCAR cluster solution
> (http://www.csm.ornl.gov/oscar/home.html)
> There are some others available as well:
>   Sun's Grid Engine (http://gridengine.sunsource.net)
>   which is a reincarnation of Codine, which was a reincarnation
>   of DQS.

   DQS/SGE/PBS/*NQS are all implementations of the standard qsub,qstat, etc
POSIX queueing systems.  Debian really ought to have at least one packaged
implementation of this (DQS was, but I couldn't keep it working).
     SGE isn't just a reincarnation, it's clearly a fork of the same
codebase.  The build system is as horrible as ever, even though it was
replaced with PVM's.  :) The memory leaks in DQS327 seem to have been
plugged, and the MPICH parallel environment is working well.  I haven't
touched SGE 6.0 yet (what with all it's Java/Globus/SSL tie ins).  SGE 5.3
is very much like DQS 3.2.7 (the last packaged Debian DQS).  SGE still has
the original DQS fair-share scheduler, but it's no longer the default, a
FIFO scheduler is. I strongly suggest you switch back if you have more than
one user: qconf -msconf, user_sort=true IIRC).

An example wrapper to avoid having to set SGE_ROOT globally (be more
Debian-like, probably how I'd package it this time rather than hacking up
the source code):

export SGE_ROOT=/dist/OS-SGEDIST/SGE53
exec ${SGE_ROOT}/bin/${ARCH}/qsub $*

   Lots of little quirks getting parallel environments and interactive jobs
to run, perhaps something for a wiki or howto.


Reply to: