[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: How to build a cluter based on 2 PCs?


Thanks very much
After modify the machines.LINUX, my parallel job runs fine with mpirun submission. But there is problem with output from the client node. I have make sure the "root" on client node can write on the shared nfs directory. But it seems the client node didn't generate the file I asked for in my code(refer to the code below, only debug_server.txt is generate after the execution of my job through mpirun -np 2 xx).

        if( my_rank .eq. 0) then
     $                IOSTAT=ierr)
        else if (  my_rank .eq. 1) then
     $                IOSTAT=ierr)

Should I keep working on nfs or something else?

Thanks very much


Mario Lang wrote:
Xiaoming Hu <xhu@ncsu.edu> writes:

I guess I need to do a research on how to submit the job through mpirun

Initially I thought mpirun will know the machines in the cluster since
I listed them in /etc/hosts.

No, mpirun does not guess the machines that should be involved in your
paralell application.  You can configure the default list of machines
used by mpirun in the file /etc/mpich/machines.LINUX (this only has to be done
on the head node, if you do not need to call mpirun on any other node in your

However, in a typical cluster environment, a paralell job does not
always span across the whole cluster, and one would probably like to be
able to run several paralell jobs at once using the available resources.
That is why typically, one uses some kind of job queueing system (like torque).
In such a system, you submit a job with certain criteria (attributes)
like the number of nodes (and CPUs per node) you would like to use for your
When the job is executed (typically a shell script)
the job queueing system tells the script somehow which hosts are allowed
to be used in this job (based on the attributes given to the job initially).
And here is where the -machinefile argument is typically used.
You generate a temporary machines file for your job in the job script,
and run mpirun with the -machinefile argument to tell it an explicit
list of hosts.

Another question: do I need a copy of hosts on each of the machines in
the cluster(basically 2 laptops in my case)?

You need a properly configured /etc/hosts on all of your cluster nodes
in order to have rsh (ssh) passwordless logins work properly.  However,
you only need your generated machine file (or your default
/etc/mpich/machines.LINUX) on the node you run mpirun at.

Thanks very much


Mario Lang wrote:
Xiaoming Hu <xhu@ncsu.edu> writes:

I have two laptops with ubuntu system. I got NFS working also ssh
without password prompt between the two laptops. I also installed
And my simple parallel code is compiled successfully. But after I use
mpirun -np n xxx to submit my job. it doesn't work.
What error message do you get?  Did you configure your default MPICH hosts
file, or create one for your job?

mpirun -machinefile some-file-name -np xxx ...

Reply to: