[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Beowulf Cluster is very slow. Suggestions needed to increase the speed.



Hello,

>>>Is the code running in the 'single system' the same running in parallel ?

If i install the software only in the head node (/usr/local/) mpirun says the software is not available in other nodes and the job terminates automatically. So, I have installed Gromacs (the software for simulation) only in head node (/home/mpiuser/)  and it automatically installed in all other nodes in /home/mpiuser/ (due to ssh without password, i guess so). 

>>>Is the software using network efficiently ?

I have searched google but i am unable to find out. How can i find out that the software uses network efficiently. May be can I find that in openmpi manual?

>>>What about network traffic ? It's high ?

i will use "ntop' to find out. Presently i am running a job it will finish in a week and i will start troubleshooting the cluster things and update the thread .

>>>Why do you need a layer 3 ? Are cluster's nodes in different network ?

I learned from few documentations Master node must have two network cards for this type of cluster but we dont have in our systems. It seems from the discussions even the USB network card is slow. I dont know how to add additional network cards to the system. I am running clusters in the same network (only four systems using a single router).

It seems to me from the discussions that there are three problems

1) we dont have two network cards systems.

2) The router i was using (iptime 100Mbps). Although, the speed of those jobs increased still it is not efficient when i used 1000Mbps switch compare to a single system.

3) I have used this tutorial  http://flakrat.blogspot.kr/2013/04/how-to-compile-gromacs-461-with-openmpi.html (for gromacs) to install openmpi, gromacs. Only difference is i have used gcc compiler instead of intel compiler. That is the only difference (even they say gcc is make faster code than intel compilers) . I have read the config (log files for open mpi, gromacs, fftw libraries etc..) for errors but i found the installations of the softwares were quite ok. Even if i did some misconfiguration how do i find out that other than reading the config log files.  Is there are any other way?

I have observed one more thing  when installing gromacs. In three system i have CPU_ACCELERATION (AVX_256) and one system has SSE2. http://www.gromacs.org/Documentation/Acceleration_and_parallelization#CPU_acceleration.3a_SSE.2c_AVX.2c_etc. However, even if i run mpi job in three systems (configure with AVX_256) it does not increase the speed comparatively.


Thank you for your time,

regards



On Tue, Oct 21, 2014 at 1:44 AM, Rogerio Bastos <writeme@rogeriobastos.eti.br> wrote:
Is the code running in the 'single system' the same running in parallel ?
Is the software using network efficiently ?
What about network traffic ? It's high ?
Why do you need a layer 3 ? Are cluster's nodes in different network ?
I'm not sure if your problem is network bottleneck.

On 2014-10-14 11:42, suresh kannan wrote:
Hi,

First of all, why do you think network is the bottleneck?

Thank you for your concern.

If i run a specific job in a single system (without parallel) it
showed three months (aprox) time in a quad core processor.  When i did
parallel using four systems it showed October 2015. I was using 100
Mpbs (ip time router). I have conformed that all the systems uses the
processors [using "top" command]. After that TooMeeK point out i have
to use "Gigabit routing switch supporting layer 3". I dont have that
switch in our lab. However, we had Netgear GS608 1000 Mpbs switch it
reduced the time (Feb 2015). Moreover, I read the link
http://cs.boisestate.edu/~amit/research/beowulf/beowulf-setup.pdf [6]

suggested by TooMeek. They have also suggested that Network switch is
important.  Also, I saw few videos in youtube about layer 3 switch
capabilities. Therefore, I am convinced that "Gigabit routing switch
supporting layer 3" will solve this issue.

regards

Suresh

On Tue, Oct 14, 2014 at 4:40 AM, Rogerio Bastos
<writeme@rogeriobastos.eti.br> wrote:

First of all, why do you think network is the bottleneck?
Be sure of this before spend money with network equipments!

On 2014-10-07 00:50, suresh kannan wrote:

I am an Indian student in suwon, korea. I built a Beowulf cluster
(system information below) with four systems in our lab for our
simulation work with the help of good tutorials. In those
tutorials
they have mentioned all the system should have static ip
addresses.
Unfortunately, in all our labs we have been provided with dynamic
ip
address[5 ips for 15 members in three separate labs]. I have
requested
four more ip's from our university system admin. Due to the
language
problems, i conveyed the requirement through my korean lab mate
and i
dont know the reason why he denied us the static ip. So i found
another way to skip this procedure


http://www.reddit.com/r/linuxquestions/comments/2gubad/why_static_ip_address_is_necessary_for_linux/
[1]
[1].

Someone suggested to use a router (one static ip) and set static
ip
for the four computers through a router. I did that and it
worked.
However, the cluster is very slow. For instance If i submit my
simulation job in a single computer [4 core processor], it takes
2
months to complete a specific job. Although, if i connect 4
systems it
shows it take 6 months to complete the same job. It is actually
using
10 core processor [3,3,2,2-100% each]. I used TOP command to see
how
much processor the head and other nodes are using. I have used
openMPI
to do parallel the systems. I am using GROMACS (Parallelization
based
on MPI has been part of this software). I followed a parallel
configuration for the Gromacs with the help of this tutorial


http://flakrat.blogspot.kr/2013/04/how-to-compile-gromacs-461-with-openmpi.html
[2]
[2]. After reading few posts


http://www.reddit.com/r/linuxquestions/comments/2gbgbg/what_would_be_the_best_linux_distro_for_folding/
[3]
[3] i suspected the network router might be an issue.

Can you suggest me how can i troubleshoot this problem? Some one
suggested to use 2 network ports and make linux as a router and
use a
gigabitswitch to get the speed. However, we dont have 2 network
ports
system. If this is compulsory i can buy network ports (USB one).

Where do i start now?

Can i make my head node as a router and use USB network port (for
the
second network port) and connect to a gigabitswitch (any model
suggestion?) to connect other nodes. I dont know much about
networking
stuffs. It will be helpful if any experts can suggest to
troubleshoot
this issue.

Thank you for your time.

regards

Suresh

System Informations
Head node Processor : Intel core i3
RAM : 1 GB
No. of processor : 4
Network cards : 03:00.0 Ethernet controller: Realtek
Semiconductor
Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet
Controller
(rev 06)
System company : Samsung
Architecture : x86_64
OS flavour : Linux Mint 17 Qiana

Node1 Processor : Intel Quad core
RAM : 3 GB
No. of processor : 4
Network cards : 03:00.0 Ethernet controller: Realtek
Semiconductor
Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet
Controller
(rev 02)
System company : TG DREAMSYS
Architecture : x86_64
OS flavour : Linux Mint 17 Qiana

Node2 Processor : Intel core i3
RAM : 1 GB
No. of processor : 4
Network cards : 03:00.0 Ethernet controller: Realtek
Semiconductor
Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet
Controller
(rev 06)
System company : Samsung Architecture : x86_64
OS flavour : Linux Mint 17 Qiana

Node3 Processor : Intel core i3
RAM : 1 GB
No. of processor : 2 Network cards : 02:00.0 Ethernet controller
:
Qualcomm Atheros Attansic L2 Fast Ethernet (rev a0)
System company : JOOYONTECH
Architecture : x86_64
OS flavour : Linux Mint 17 Qiana

Router Company : ipTIME N604R
Maximum speed : 160Mbps (LAN to WAN)

Links:
------
[1]


http://www.reddit.com/r/linuxquestions/comments/2gubad/why_static_ip_address_is_necessary_for_linux/
[1]
[2]


http://flakrat.blogspot.kr/2013/04/how-to-compile-gromacs-461-with-openmpi.html
[2]
[3]


http://www.reddit.com/r/linuxquestions/comments/2gbgbg/what_would_be_the_best_linux_distro_for_folding/
[3]

--

My email was sent by May First/People Link
https://mayfirst.org [4]

--
To UNSUBSCRIBE, email to debian-beowulf-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact
listmaster@lists.debian.org
Archive:

[🔎] 40cfbd085315754eb7f15af3a3dfc943@mail.mayfirst.org" target="_blank">https://lists.debian.org/40cfbd085315754eb7f15af3a3dfc943@mail.mayfirst.org
[5]



Links:
------
[1]
http://www.reddit.com/r/linuxquestions/comments/2gubad/why_static_ip_address_is_necessary_for_linux/
[2]
http://flakrat.blogspot.kr/2013/04/how-to-compile-gromacs-461-with-openmpi.html
[3]
http://www.reddit.com/r/linuxquestions/comments/2gbgbg/what_would_be_the_best_linux_distro_for_folding/
[4] https://mayfirst.org
[5] [🔎] 40cfbd085315754eb7f15af3a3dfc943@mail.mayfirst.org" target="_blank">https://lists.debian.org/40cfbd085315754eb7f15af3a3dfc943@mail.mayfirst.org
[6] http://cs.boisestate.edu/%7Eamit/research/beowulf/beowulf-setup.pdf

--

My email was sent by May First/People Link
https://mayfirst.org


--
To UNSUBSCRIBE, email to debian-beowulf-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: [🔎] 89187b3fef245c9ff3f45fa1ece021eb@mail.mayfirst.org" target="_blank">https://lists.debian.org/89187b3fef245c9ff3f45fa1ece021eb@mail.mayfirst.org



Reply to: