I got the parallel code running on my two-computer-cluster

To: debian-beowulf@lists.debian.org
Cc: Xiaoming Hu <xhu@ncsu.edu>, Mario Lang <mlang@debian.org>
Subject: I got the parallel code running on my two-computer-cluster
From: Xiaoming Hu <xhu@ncsu.edu>
Date: Thu, 30 Nov 2006 01:50:48 -0500
Message-id: <[🔎] 456E7F48.6030905@ncsu.edu>
In-reply-to: <[🔎] 456E6A95.9090909@ncsu.edu>
References: <[🔎] 456DB640.8050600@ncsu.edu> <[🔎] 87irgyp2yx.fsf@x2.delysid.org> <[🔎] 456DDF97.3020207@ncsu.edu> <[🔎] 87ac29q2qp.fsf@x2.delysid.org> <[🔎] 456E6A95.9090909@ncsu.edu>

Hey,

Sorry, I figured out what was the problem. It was just because the bugsin my code.


Thanks for the help!

But I have anther problem. Since the two computers in my cluster arebehind my router, so the ip address is dynamic. So every time I need tochange the place where it is ip address dependent.


How to change the ip address to static?

Thanks again


Xiaoming

Xiaoming Hu wrote:

Hey

Thanks very much

After modify the machines.LINUX, my parallel job runs fine with mpirunsubmission.But there is problem with output from the client node. I have make surethe "root" on client node can write on the shared nfs directory.But it seems the client node didn't generate the file I asked for in mycode(refer to the code below, only debug_server.txt is generate afterthe execution of my job through mpirun -np 2 xx).


        if( my_rank .eq. 0) then
        open(888,file='debug_server.txt',
     $                IOSTAT=ierr)
        else if (  my_rank .eq. 1) then
        open(888,file='debug_client.txt',
     $                IOSTAT=ierr)
        endif

Should I keep working on nfs or something else?

Thanks very much

Xiaoming

Mario Lang wrote:

Xiaoming Hu <xhu@ncsu.edu> writes:

I guess I need to do a research on how to submit the job through mpirun

Initially I thought mpirun will know the machines in the cluster since
I listed them in /etc/hosts.


No, mpirun does not guess the machines that should be involved in your
paralell application.  You can configure the default list of machines

used by mpirun in the file /etc/mpich/machines.LINUX (this only has tobe doneon the head node, if you do not need to call mpirun on any other nodein your

cluster).

However, in a typical cluster environment, a paralell job does not
always span across the whole cluster, and one would probably like to be
able to run several paralell jobs at once using the available resources.

That is why typically, one uses some kind of job queueing system (liketorque).

In such a system, you submit a job with certain criteria (attributes)

like the number of nodes (and CPUs per node) you would like to use foryour

job.
When the job is executed (typically a shell script)
the job queueing system tells the script somehow which hosts are allowed

to be used in this job (based on the attributes given to the jobinitially).

And here is where the -machinefile argument is typically used.
You generate a temporary machines file for your job in the job script,
and run mpirun with the -machinefile argument to tell it an explicit
list of hosts.

Another question: do I need a copy of hosts on each of the machines in
the cluster(basically 2 laptops in my case)?


You need a properly configured /etc/hosts on all of your cluster nodes
in order to have rsh (ssh) passwordless logins work properly.  However,
you only need your generated machine file (or your default
/etc/mpich/machines.LINUX) on the node you run mpirun at.

Thanks very much

Xiaoming

Mario Lang wrote:

Xiaoming Hu <xhu@ncsu.edu> writes:

I have two laptops with ubuntu system. I got NFS working also ssh
without password prompt between the two laptops. I also installed
MPICH.
And my simple parallel code is compiled successfully. But after I use
mpirun -np n xxx to submit my job. it doesn't work.

What error message do you get? Did you configure your default MPICHhosts

file, or create one for your job?

mpirun -machinefile some-file-name -np xxx ...

Reply to:

References:
- How to build a cluter based on 2 PCs?
  - From: Xiaoming Hu <xhu@ncsu.edu>
- Re: How to build a cluter based on 2 PCs?
  - From: Mario Lang <mlang@debian.org>
- Re: How to build a cluter based on 2 PCs?
  - From: Xiaoming Hu <xhu@ncsu.edu>
- Re: How to build a cluter based on 2 PCs?
  - From: Mario Lang <mlang@debian.org>
- Re: How to build a cluter based on 2 PCs?
  - From: Xiaoming Hu <xhu@ncsu.edu>

Prev by Date: Re: How to build a cluter based on 2 PCs?
Previous by thread: Re: How to build a cluter based on 2 PCs?
Next by thread: View cluster settings
Index(es):
- Date
- Thread