Hi,
The link you have provided me was very helpful. I haven't prepared
myself before starting clusters. Few years back i did clustering with
three system using PVM for different application i.e protein ligand
docking. With three systems it took some 2 hours to finish a particular
library screening. Since, it is few hours i was happy and haven't
checked the speed of the clusters. I thought to myself i learned a
clustering technique. Therefore, this time i started blindly before
preparing clustering for different applications which takes months to
complete a specific job. Therefore, now i learned depending upon the
purpose of clustering requirement of things will be different.
I learned that speed of the network is a biggest bottleneck for
clustering especially for our need. In our lab I have found a 1000Mbs
switch which presently reduces some time compared to 100Mbs however it
is not efficient. Still it takes a lot of time. I assume we need
10gigabit switch. I was not aware of the price these "Gigabit routing
switch supporting layer 3". It seems to me that even 8-port 10 gigabit
switch cost approx. 800$. I am still hesitant to ask my mentor for the
10 gigabit switch. Since, I dont have no experience i had a thought what
if i am wrong somewhere although gigabit switch have to work. Now, I am
looking for 10 gigabit switch in near by labs so that i will connect and
check whether it sufficiently efficient for our job and then i can ask
my mentor with some confidence.
i also learned from someone we can use cluster OS specialized designed
for this clustering purpose for instance Pelican cluster
http://pareto.uab.es/mcreel/PelicanHPC/. I will install that cluster os
and check the performance too. I am reading materials to increase the
speed and performance of cluster in terms of hardware as well as the
software's. I will tune our application according to our need.
And, also in our lab room we have dual boot systems for 8 people
(windows with either linux mint, ubuntu, fedora). In given time those
systems will be either in windows or linux environment depending upon
our work. Daily 6 hours (nights) and on Sundays our system will be idle.
I am not capable to use those free computer timing for our advantage due
to my skill and also time. I guess we can use these computer time if we
have gigabit switch with networking skills. If someone did similar
stuffs please document either in your blogs or in email lists so that
users like me might get benefited.
Although, it took time i learned a lot during this clustering. Thank you
for your time.
regards
Suresh
On Wed, Oct 8, 2014 at 7:38 AM, Tomcio <toomeek_85@o2.pl
<mailto:toomeek_85@o2.pl>> wrote:
Hello,
been here since years, but very low number of questions so I'm so
happy to see new thread here ;)
I'm not boewulf cluster admin but..
I belive Your network is slow because this routing device is
internally limited to low throughput, You need Gigabit-capable
routing device (read this as: Gigabit routing switch supporting
layer 3).
I suspect all Your class rooms are in different IP subnets?
If yes then the routing device is bottleneck..
Even modern Core i5 based linux router will create additional delay
times in routing when it comes to cluster network, of course under
high load at high throughput (what clustering requires). So simple,
raw networking is best option.
Example capable switch models that can do routing are:
Cisco SG500X
Cisco Catalyst series
Dell PowerConnect 7000 series (or lower)
Of course, they could be too expensive so just look around on switch
that can do IP routing. There are even 8-port Gigabit Managed
switches available on market.
If Your sys admin isn't blocking other network subnets You could set
static IP on all Your cluster nodes to different network subnet
(let's say.. 192.168.99.1-192.168.99.10 with netmask 255.255.255.0)
and check if they see each other (probably this isn't possible) and
use one of them to provide Internet access (master node?) with 2
Gigabit cards.
USB Gigabit cards are most up to ~480Mbit in throughput since
they're USB 2.0. It's half of needed performance and! be aware of
missing Linux drivers.. ;)
I see all Your hardware is Gigabit, so I belive You need Layer 3
Gigabit switch. Also, please check are Your NICs supporting offload
functions, then can help with high network load.
I personally these in /etc/rc.local
echo "Setting offload functions on Intel PRO/1000 NICs..."
ethtool -K eth1 rx on tx on sg on gso on gro on tso on
ethtool -K eth2 rx on tx on sg on gso on gro on tso on
And there are tips:
http://cs.boisestate.edu/~__amit/research/beowulf/beowulf-__setup.pdf <http://cs.boisestate.edu/~amit/research/beowulf/beowulf-setup.pdf>
See the section: 1.2 Networking Hardware
Hope this helps.
Cheers,
TooMeeK
W dniu 2014-10-07 05:50, suresh kannan pisze:
I am an Indian student in suwon, korea. I built a Beowulf cluster
(system information below) with four systems in our lab for our
simulation work with the help of good tutorials. In those
tutorials they
have mentioned all the system should have static ip addresses.
Unfortunately, in all our labs we have been provided with dynamic ip
address[5 ips for 15 members in three separate labs]. I have
requested
four more ip's from our university system admin. Due to the language
problems, i conveyed the requirement through my korean lab mate
and i
dont know the reason why he denied us the static ip. So i found
another
way to skip this procedure
http://www.reddit.com/r/__linuxquestions/comments/__2gubad/why_static_ip_address___is_necessary_for_linux/
<http://www.reddit.com/r/linuxquestions/comments/2gubad/why_static_ip_address_is_necessary_for_linux/>.
Someone suggested to use a router (one static ip) and set static
ip for
the four computers through a router. I did that and it worked.
However,
the cluster is very slow. For instance If i submit my simulation
job in
a single computer [4 core processor], it takes 2 months to
complete a
specific job. Although, if i connect 4 systems it shows it take
6 months
to complete the same job. It is actually using 10 core processor
[3,3,2,2-100% each]. I used TOP command to see how much
processor the
head and other nodes are using. I have used openMPI to do
parallel the
systems. I am using GROMACS (Parallelization based on MPI has
been part
of this software). I followed a parallel configuration for the
Gromacs
with the help of this tutorial
http://flakrat.blogspot.kr/__2013/04/how-to-compile-__gromacs-461-with-openmpi.html
<http://flakrat.blogspot.kr/2013/04/how-to-compile-gromacs-461-with-openmpi.html>.
After reading few posts
http://www.reddit.com/r/__linuxquestions/comments/__2gbgbg/what_would_be_the_best___linux_distro_for_folding/
<http://www.reddit.com/r/linuxquestions/comments/2gbgbg/what_would_be_the_best_linux_distro_for_folding/>
i suspected the network router might be an issue.
Can you suggest me how can i troubleshoot this problem? Some one
suggested to use 2 network ports and make linux as a router and
use a
gigabitswitch to get the speed. However, we dont have 2 network
ports
system. If this is compulsory i can buy network ports (USB one).
Where do i start now?
Can i make my head node as a router and use USB network port
(for the
second network port) and connect to a gigabitswitch (any model
suggestion?) to connect other nodes. I dont know much about
networking
stuffs. It will be helpful if any experts can suggest to
troubleshoot
this issue.
Thank you for your time.
regards
Suresh
System Informations
Head node Processor : Intel core i3
RAM : 1 GB
No. of processor : 4
Network cards : 03:00.0 Ethernet controller: Realtek
Semiconductor Co.,
Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
(rev 06)
System company : Samsung
Architecture : x86_64
OS flavour : Linux Mint 17 Qiana
Node1 Processor : Intel Quad core
RAM : 3 GB
No. of processor : 4
Network cards : 03:00.0 Ethernet controller: Realtek
Semiconductor Co.,
Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
(rev 02)
System company : TG DREAMSYS
Architecture : x86_64
OS flavour : Linux Mint 17 Qiana
Node2 Processor : Intel core i3
RAM : 1 GB
No. of processor : 4
Network cards : 03:00.0 Ethernet controller: Realtek
Semiconductor Co.,
Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
(rev 06)
System company : Samsung Architecture : x86_64
OS flavour : Linux Mint 17 Qiana
Node3 Processor : Intel core i3
RAM : 1 GB
No. of processor : 2 Network cards : 02:00.0 Ethernet controller :
Qualcomm Atheros Attansic L2 Fast Ethernet (rev a0)
System company : JOOYONTECH
Architecture : x86_64
OS flavour : Linux Mint 17 Qiana
Router Company : ipTIME N604R
Maximum speed : 160Mbps (LAN to WAN)
--
To UNSUBSCRIBE, email to debian-beowulf-REQUEST@lists.__debian.org
<mailto:debian-beowulf-REQUEST@lists.debian.org>
with a subject of "unsubscribe". Trouble? Contact
listmaster@lists.debian.org <mailto:listmaster@lists.debian.org>
Archive: https://lists.debian.org/__54346B82.6000500@o2.pl
<[🔎] 54346B82.6000500@o2.pl">https://lists.debian.org/[🔎] 54346B82.6000500@o2.pl>