[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: How to build a cluter based on 2 PCs?



Xiaoming Hu <xhu@ncsu.edu> writes:

> I guess I need to do a research on how to submit the job through mpirun
>
> Initially I thought mpirun will know the machines in the cluster since
> I listed them in /etc/hosts.

No, mpirun does not guess the machines that should be involved in your
paralell application.  You can configure the default list of machines
used by mpirun in the file /etc/mpich/machines.LINUX (this only has to be done
on the head node, if you do not need to call mpirun on any other node in your
cluster).

However, in a typical cluster environment, a paralell job does not
always span across the whole cluster, and one would probably like to be
able to run several paralell jobs at once using the available resources.
That is why typically, one uses some kind of job queueing system (like torque).
In such a system, you submit a job with certain criteria (attributes)
like the number of nodes (and CPUs per node) you would like to use for your
job.
When the job is executed (typically a shell script)
the job queueing system tells the script somehow which hosts are allowed
to be used in this job (based on the attributes given to the job initially).
And here is where the -machinefile argument is typically used.
You generate a temporary machines file for your job in the job script,
and run mpirun with the -machinefile argument to tell it an explicit
list of hosts.

> Another question: do I need a copy of hosts on each of the machines in
> the cluster(basically 2 laptops in my case)?

You need a properly configured /etc/hosts on all of your cluster nodes
in order to have rsh (ssh) passwordless logins work properly.  However,
you only need your generated machine file (or your default
/etc/mpich/machines.LINUX) on the node you run mpirun at.

> Thanks very much
>
> Xiaoming
>
> Mario Lang wrote:
>> Xiaoming Hu <xhu@ncsu.edu> writes:
>>
>>> I have two laptops with ubuntu system. I got NFS working also ssh
>>> without password prompt between the two laptops. I also installed
>>> MPICH.
>>> And my simple parallel code is compiled successfully. But after I use
>>> mpirun -np n xxx to submit my job. it doesn't work.
>>
>> What error message do you get?  Did you configure your default MPICH hosts
>> file, or create one for your job?
>>
>> mpirun -machinefile some-file-name -np xxx ...

-- 
CYa,
  Mario | Debian Developer <URL:http://debian.org/>
  .''`. | Get my public key via finger mlang@db.debian.org
 : :' : | 1024D/7FC1A0854909BCCDBE6C102DDFFC022A6B113E44
 `. `' 
   `-

Attachment: pgp3VGjcienP3.pgp
Description: PGP signature


Reply to: