[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Beowulf Cluster is very slow. Suggestions needed to increase the speed.



Hi,

The link you have provided me was very helpful. I haven't prepared myself before starting clusters. Few years back i did clustering with three system using PVM for different application i.e protein ligand docking. With three systems it took some 2 hours to finish a particular library screening. Since, it is few hours i was happy and haven't checked the speed of the clusters. I thought to myself i learned a clustering technique. Therefore, this time i started blindly before preparing clustering for different applications which takes months to complete a specific job. Therefore, now i learned depending upon the purpose of clustering requirement of things will be different.

I learned that speed of the network is a biggest bottleneck for clustering especially for our need. In our lab I have found a 1000Mbs switch which presently reduces some time compared to 100Mbs however it is not efficient. Still it takes a lot of time. I assume we need 10gigabit switch. I was not aware of the price these "Gigabit routing switch supporting layer 3". It seems to me that even 8-port 10 gigabit switch cost approx. 800$. I am still hesitant to ask my mentor for the 10 gigabit switch. Since, I dont have no experience i had a thought what if i am wrong somewhere although gigabit switch have to work. Now, I am looking for 10 gigabit switch in near by labs so that i will connect and check whether it sufficiently efficient for our job and then i can ask my mentor with some confidence.

i also learned from someone we can use cluster OS specialized designed for this clustering purpose for instance Pelican cluster http://pareto.uab.es/mcreel/PelicanHPC/. I will install that cluster os and check the performance too. I am reading materials to increase the speed and performance of cluster in terms of hardware as well as the software's. I will tune our application according to our need.

And, also in our lab room we have dual boot systems for 8 people (windows with either linux mint, ubuntu, fedora). In given time those systems will be either in windows or linux environment depending upon our work. Daily 6 hours (nights) and on Sundays our system will be idle. I am not capable to use those free computer timing for our advantage due to my skill and also time. I guess we can use these computer time if we have gigabit switch with networking skills. If someone did similar stuffs please document either in your blogs or in email lists so that users like me might get benefited.

Although, it took time i learned a lot during this clustering. Thank you for your time.


regards
Suresh




On Wed, Oct 8, 2014 at 7:38 AM, Tomcio <toomeek_85@o2.pl> wrote:
Hello,

been here since years, but very low number of questions so I'm so happy to see new thread here ;)

I'm not boewulf cluster admin but..

I belive Your network is slow because this routing device is internally limited to low throughput, You need Gigabit-capable routing device (read this as: Gigabit routing switch supporting layer 3).
I suspect all Your class rooms are in different IP subnets?
If yes then the routing device is bottleneck..

Even modern Core i5 based linux router will create additional delay times in routing when it comes to cluster network, of course under high load at high throughput (what clustering requires). So simple, raw networking is best option.

Example capable switch models that can do routing are:
Cisco SG500X
Cisco Catalyst series
Dell PowerConnect 7000 series (or lower)
Of course, they could be too expensive so just look around on switch that can do IP routing. There are even 8-port Gigabit Managed switches available on market.

If Your sys admin isn't blocking other network subnets You could set static IP on all Your cluster nodes to different network subnet (let's say.. 192.168.99.1-192.168.99.10 with netmask 255.255.255.0) and check if they see each other (probably this isn't possible) and use one of them to provide Internet access (master node?) with 2 Gigabit cards.

USB Gigabit cards are most up to ~480Mbit in throughput since they're USB 2.0. It's half of needed performance and! be aware of missing Linux drivers.. ;)
I see all Your hardware is Gigabit, so I belive You need Layer 3 Gigabit switch. Also, please check are Your NICs supporting offload functions, then can help with high network load.
I personally these in /etc/rc.local
echo "Setting offload functions on Intel PRO/1000 NICs..."
ethtool -K eth1 rx on tx on sg on gso on gro on tso on
ethtool -K eth2 rx on tx on sg on gso on gro on tso on

And there are tips:
http://cs.boisestate.edu/~amit/research/beowulf/beowulf-setup.pdf
See the section: 1.2 Networking Hardware

Hope this helps.

Cheers,
TooMeeK




W dniu 2014-10-07 05:50, suresh kannan pisze:

I am an Indian student in suwon, korea. I built a Beowulf cluster
(system information below) with four systems in our lab for our
simulation work with the help of good tutorials. In those tutorials they
have mentioned all the system should have static ip addresses.
Unfortunately, in all our labs we have been provided with dynamic ip
address[5 ips for 15 members in three separate labs]. I have requested
four more ip's from our university system admin. Due to the language
problems, i conveyed the requirement through my korean lab mate and i
dont know the reason why he denied us the static ip. So i found another
way to skip this procedure
http://www.reddit.com/r/linuxquestions/comments/2gubad/why_static_ip_address_is_necessary_for_linux/.


Someone suggested to use a router (one static ip) and set static ip for
the four computers through a router. I did that and it worked. However,
the cluster is very slow. For instance If i submit my simulation job in
a single computer [4 core processor], it takes 2 months to complete a
specific job. Although, if i connect 4 systems it shows it take 6 months
to complete the same job. It is actually using 10 core processor
[3,3,2,2-100% each]. I used TOP command to see how much processor the
head and other nodes are using. I have used openMPI to do parallel the
systems. I am using GROMACS (Parallelization based on MPI has been part
of this software). I followed a parallel configuration for the Gromacs
with the help of this tutorial
http://flakrat.blogspot.kr/2013/04/how-to-compile-gromacs-461-with-openmpi.html.
After reading few posts
http://www.reddit.com/r/linuxquestions/comments/2gbgbg/what_would_be_the_best_linux_distro_for_folding/
i suspected the network router might be an issue.

Can you suggest me how can i troubleshoot this problem? Some one
suggested to use 2 network ports and make linux as a router and use a
gigabitswitch to get the speed. However, we dont have 2 network ports
system. If this is compulsory i can buy network ports (USB one).

Where do i start now?

Can i make my head node as a router and use USB network port (for the
second network port) and connect to a gigabitswitch (any model
suggestion?) to connect other nodes. I dont know much about networking
stuffs. It will be helpful if any experts can suggest to troubleshoot
this issue.

Thank you for your time.

regards

Suresh


System Informations

Head node Processor : Intel core i3
RAM : 1 GB
No. of processor : 4
Network cards : 03:00.0 Ethernet controller: Realtek Semiconductor Co.,
Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
System company : Samsung
Architecture : x86_64
OS flavour : Linux Mint 17 Qiana

Node1 Processor : Intel Quad core
RAM : 3 GB
No. of processor : 4
Network cards : 03:00.0 Ethernet controller: Realtek Semiconductor Co.,
Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
System company : TG DREAMSYS
Architecture : x86_64
OS flavour : Linux Mint 17 Qiana

Node2 Processor : Intel core i3
RAM : 1 GB
No. of processor : 4
Network cards : 03:00.0 Ethernet controller: Realtek Semiconductor Co.,
Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
System company : Samsung Architecture : x86_64
OS flavour : Linux Mint 17 Qiana

Node3 Processor : Intel core i3
RAM : 1 GB
No. of processor : 2 Network cards : 02:00.0 Ethernet controller :
Qualcomm Atheros Attansic L2 Fast Ethernet (rev a0)
System company : JOOYONTECH
Architecture : x86_64
OS flavour : Linux Mint 17 Qiana

Router Company : ipTIME N604R
Maximum speed : 160Mbps (LAN to WAN)


--
To UNSUBSCRIBE, email to debian-beowulf-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: [🔎] 54346B82.6000500@o2.pl" target="_blank">https://lists.debian.org/[🔎] 54346B82.6000500@o2.pl



Reply to: