[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Beowulf Cluster is very slow. Suggestions needed to increase the speed.



Hi,

>>>First of all, why do you think network is the bottleneck?

Thank you for your concern.

If i run a specific job in a single system (without parallel) it showed three months (aprox) time in a quad core processor.  When i did parallel using four systems it showed October 2015. I was using 100 Mpbs (ip time router). I have conformed that all the systems uses the processors [using "top" command]. After that TooMeeK point out i have to use "Gigabit routing switch supporting layer 3". I dont have that switch in our lab. However, we had Netgear GS608 1000 Mpbs switch it reduced the time (Feb 2015). Moreover, I read the link http://cs.boisestate.edu/~amit/research/beowulf/beowulf-setup.pdf suggested by TooMeek. They have also suggested that Network switch is important.  Also, I saw few videos in youtube about layer 3 switch capabilities. Therefore, I am convinced that "Gigabit routing switch supporting layer 3" will solve this issue.

regards
Suresh





On Tue, Oct 14, 2014 at 4:40 AM, Rogerio Bastos <writeme@rogeriobastos.eti.br> wrote:
First of all, why do you think network is the bottleneck?
Be sure of this before spend money with network equipments!

On 2014-10-07 00:50, suresh kannan wrote:
I am an Indian student in suwon, korea. I built a Beowulf cluster
(system information below) with four systems in our lab for our
simulation work with the help of good tutorials. In those tutorials
they have mentioned all the system should have static ip addresses.
Unfortunately, in all our labs we have been provided with dynamic ip
address[5 ips for 15 members in three separate labs]. I have requested
four more ip's from our university system admin. Due to the language
problems, i conveyed the requirement through my korean lab mate and i
dont know the reason why he denied us the static ip. So i found
another way to skip this procedure
http://www.reddit.com/r/linuxquestions/comments/2gubad/why_static_ip_address_is_necessary_for_linux/
[1].

Someone suggested to use a router (one static ip) and set static ip
for the four computers through a router. I did that and it worked.
However, the cluster is very slow. For instance If i submit my
simulation job in a single computer [4 core processor], it takes 2
months to complete a specific job. Although, if i connect 4 systems it
shows it take 6 months to complete the same job. It is actually using
10 core processor [3,3,2,2-100% each]. I used TOP command to see how
much processor the head and other nodes are using. I have used openMPI
to do parallel the systems. I am using GROMACS (Parallelization based
on MPI has been part of this software). I followed a parallel
configuration for the Gromacs with the help of this tutorial
http://flakrat.blogspot.kr/2013/04/how-to-compile-gromacs-461-with-openmpi.html
[2]. After reading few posts
http://www.reddit.com/r/linuxquestions/comments/2gbgbg/what_would_be_the_best_linux_distro_for_folding/
[3] i suspected the network router might be an issue.


Can you suggest me how can i troubleshoot this problem? Some one
suggested to use 2 network ports and make linux as a router and use a
gigabitswitch to get the speed. However, we dont have 2 network ports
system. If this is compulsory i can buy network ports (USB one).

Where do i start now?

Can i make my head node as a router and use USB network port (for the
second network port) and connect to a gigabitswitch (any model
suggestion?) to connect other nodes. I dont know much about networking
stuffs. It will be helpful if any experts can suggest to troubleshoot
this issue.

Thank you for your time.

regards

Suresh

System Informations
Head node Processor : Intel core i3
RAM : 1 GB
No. of processor : 4
Network cards : 03:00.0 Ethernet controller: Realtek Semiconductor
Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
(rev 06)
System company : Samsung
Architecture : x86_64
OS flavour : Linux Mint 17 Qiana

Node1 Processor : Intel Quad core
RAM : 3 GB
No. of processor : 4
Network cards : 03:00.0 Ethernet controller: Realtek Semiconductor
Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
(rev 02)
System company : TG DREAMSYS
Architecture : x86_64
OS flavour : Linux Mint 17 Qiana

Node2 Processor : Intel core i3
RAM : 1 GB
No. of processor : 4
Network cards : 03:00.0 Ethernet controller: Realtek Semiconductor
Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
(rev 06)
System company : Samsung Architecture : x86_64
OS flavour : Linux Mint 17 Qiana

Node3 Processor : Intel core i3
RAM : 1 GB
No. of processor : 2 Network cards : 02:00.0 Ethernet controller :
Qualcomm Atheros Attansic L2 Fast Ethernet (rev a0)
System company : JOOYONTECH
Architecture : x86_64
OS flavour : Linux Mint 17 Qiana

Router Company : ipTIME N604R
Maximum speed : 160Mbps (LAN to WAN)

Links:
------
[1]
http://www.reddit.com/r/linuxquestions/comments/2gubad/why_static_ip_address_is_necessary_for_linux/
[2]
http://flakrat.blogspot.kr/2013/04/how-to-compile-gromacs-461-with-openmpi.html
[3]
http://www.reddit.com/r/linuxquestions/comments/2gbgbg/what_would_be_the_best_linux_distro_for_folding/

--

My email was sent by May First/People Link
https://mayfirst.org


--
To UNSUBSCRIBE, email to debian-beowulf-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: [🔎] 40cfbd085315754eb7f15af3a3dfc943@mail.mayfirst.org" target="_blank">https://lists.debian.org/40cfbd085315754eb7f15af3a3dfc943@mail.mayfirst.org



Reply to: