[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Beowulf Cluster is very slow. Suggestions needed to increase the speed.



Hello,

been here since years, but very low number of questions so I'm so happy to see new thread here ;)

I'm not boewulf cluster admin but..

I belive Your network is slow because this routing device is internally limited to low throughput, You need Gigabit-capable routing device (read this as: Gigabit routing switch supporting layer 3).
I suspect all Your class rooms are in different IP subnets?
If yes then the routing device is bottleneck..

Even modern Core i5 based linux router will create additional delay times in routing when it comes to cluster network, of course under high load at high throughput (what clustering requires). So simple, raw networking is best option.

Example capable switch models that can do routing are:
Cisco SG500X
Cisco Catalyst series
Dell PowerConnect 7000 series (or lower)
Of course, they could be too expensive so just look around on switch that can do IP routing. There are even 8-port Gigabit Managed switches available on market.

If Your sys admin isn't blocking other network subnets You could set static IP on all Your cluster nodes to different network subnet (let's say.. 192.168.99.1-192.168.99.10 with netmask 255.255.255.0) and check if they see each other (probably this isn't possible) and use one of them to provide Internet access (master node?) with 2 Gigabit cards.

USB Gigabit cards are most up to ~480Mbit in throughput since they're USB 2.0. It's half of needed performance and! be aware of missing Linux drivers.. ;) I see all Your hardware is Gigabit, so I belive You need Layer 3 Gigabit switch. Also, please check are Your NICs supporting offload functions, then can help with high network load.
I personally these in /etc/rc.local
echo "Setting offload functions on Intel PRO/1000 NICs..."
ethtool -K eth1 rx on tx on sg on gso on gro on tso on
ethtool -K eth2 rx on tx on sg on gso on gro on tso on

And there are tips:
http://cs.boisestate.edu/~amit/research/beowulf/beowulf-setup.pdf
See the section: 1.2 Networking Hardware

Hope this helps.

Cheers,
TooMeeK




W dniu 2014-10-07 05:50, suresh kannan pisze:
I am an Indian student in suwon, korea. I built a Beowulf cluster
(system information below) with four systems in our lab for our
simulation work with the help of good tutorials. In those tutorials they
have mentioned all the system should have static ip addresses.
Unfortunately, in all our labs we have been provided with dynamic ip
address[5 ips for 15 members in three separate labs]. I have requested
four more ip's from our university system admin. Due to the language
problems, i conveyed the requirement through my korean lab mate and i
dont know the reason why he denied us the static ip. So i found another
way to skip this procedure
http://www.reddit.com/r/linuxquestions/comments/2gubad/why_static_ip_address_is_necessary_for_linux/.


Someone suggested to use a router (one static ip) and set static ip for
the four computers through a router. I did that and it worked. However,
the cluster is very slow. For instance If i submit my simulation job in
a single computer [4 core processor], it takes 2 months to complete a
specific job. Although, if i connect 4 systems it shows it take 6 months
to complete the same job. It is actually using 10 core processor
[3,3,2,2-100% each]. I used TOP command to see how much processor the
head and other nodes are using. I have used openMPI to do parallel the
systems. I am using GROMACS (Parallelization based on MPI has been part
of this software). I followed a parallel configuration for the Gromacs
with the help of this tutorial
http://flakrat.blogspot.kr/2013/04/how-to-compile-gromacs-461-with-openmpi.html.
After reading few posts
http://www.reddit.com/r/linuxquestions/comments/2gbgbg/what_would_be_the_best_linux_distro_for_folding/
i suspected the network router might be an issue.

Can you suggest me how can i troubleshoot this problem? Some one
suggested to use 2 network ports and make linux as a router and use a
gigabitswitch to get the speed. However, we dont have 2 network ports
system. If this is compulsory i can buy network ports (USB one).

Where do i start now?

Can i make my head node as a router and use USB network port (for the
second network port) and connect to a gigabitswitch (any model
suggestion?) to connect other nodes. I dont know much about networking
stuffs. It will be helpful if any experts can suggest to troubleshoot
this issue.

Thank you for your time.

regards

Suresh


System Informations

Head node Processor : Intel core i3
RAM : 1 GB
No. of processor : 4
Network cards : 03:00.0 Ethernet controller: Realtek Semiconductor Co.,
Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
System company : Samsung
Architecture : x86_64
OS flavour : Linux Mint 17 Qiana

Node1 Processor : Intel Quad core
RAM : 3 GB
No. of processor : 4
Network cards : 03:00.0 Ethernet controller: Realtek Semiconductor Co.,
Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
System company : TG DREAMSYS
Architecture : x86_64
OS flavour : Linux Mint 17 Qiana

Node2 Processor : Intel core i3
RAM : 1 GB
No. of processor : 4
Network cards : 03:00.0 Ethernet controller: Realtek Semiconductor Co.,
Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
System company : Samsung Architecture : x86_64
OS flavour : Linux Mint 17 Qiana

Node3 Processor : Intel core i3
RAM : 1 GB
No. of processor : 2 Network cards : 02:00.0 Ethernet controller :
Qualcomm Atheros Attansic L2 Fast Ethernet (rev a0)
System company : JOOYONTECH
Architecture : x86_64
OS flavour : Linux Mint 17 Qiana

Router Company : ipTIME N604R
Maximum speed : 160Mbps (LAN to WAN)


Reply to: