Re: Submitting parallel jobs in DQS
On Wed, Sep 01, 1999 at 09:13:28PM -0300, Enzo A. Dari wrote:
> Now I'm trying to submit parallel jobs using MPI 1.1.2 and PVM 3.4.0.
> Both of them are installed from source code (in /usr/local, shared
> by all nodes).
What you do is specify the number (hard or soft requirement) of nodes you
want to run on, and then read those nodes out of a supplied file. The
script runs only on the master node.
qsub -l linux,qty.eq.3
Some libraries (PVM) require some initialization be done first. There
is support in DQS to do this (-par PVM), but it doesn't really work.
The PVM model requires a master pvm daemon be started on one node in the
virtual machine, which then starts slave pvmds as hosts are added. The
problem is a particular node may only be in one virtual machine at a time
(per user). Multiple jobs dispersed across SMP machines will have trouble
whether they take down the virtual machine or leave it up. two virtual
machines can't be merged. Any job on a virtual machine that is taken down
dies. it's also difficult to communicate to a large virtual machine that
the queueing system has only allocated you these 3 nodes, not the whole
MPI probably doesn't have these difficulties, you'd just
"mpirun `cat $HOSTS_FILE` program" (untested, I don't use MPI at the moment).