[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: non-bonded multiple NICs w/ an unmanaged switch

Our experience, especially on 8-noders, favors a switch-separated (probably FNN) approach. Two 8 port gig switches will be cheaper than one 16 port switch, especially one that will support Layer 3 and allow you to co-mingle cloned MAC addresses. If your NICs are e1000s you can do some fancy things with them (on really expensive switches) using the iANS drivers (http://support.intel.com/support/network/adapter/1000/) but I assume your NICs are tg3s, so you will get better performance for less money with multiple switches.

Still, you want to maximize throughput on your current switch, right? Since your switch does not support Layer 3 switching, then you should probably assign all NICs unique IP addresses, as suggested in your last paragraph. This would allow you to set one NIC half-duplex in either direction. I have never tried this but it should work. Generate 2 different hosts tables and you are in business. The DLink DGS-1016TG (if that's what you have) has a 32Gbps backplane so you will never saturate the switch.

What we have tried (on 2 switches) is to use one of the NICs as our NFS-connected NIC and one as our internode commo NIC. We don't really need a dual-gigabit pipe to our NFS mount but we do need a relatively low-latency MPI pipe. It sounds like you are in the same boat.

At 09:53 PM 5/8/2004 -0300, Peter Cordes wrote:

 I have a cluster of 8 dual Opterons, and one unmanaged D-link 16port gigE
switch.  The Opterons have dual gigE on their mobos, and right now, channel
bonding is enabled.  This is a bit bogus, because both receiving NICs will
get a copy of every packet, I think. (Both NICs get the same MAC address when
bonded, and that's what switches keep track of.)  I do get ~10 or 20% higher
TCP throughput than without bonding, so it is helping a bit.  Somewhat
surprisingly, UDP packets don't seem to be getting duplicated.  I tested
with nc -u, talking to nc -l -u, and stuff I typed was only received once.
Maybe I'm wrong about the switch duplicating the packets, but I certainly
don't get twice the bandwidth.

 Anyway, I've been thinking about what can be done with an unmanaged switch.
I've considered arp table hacks like the U. Kentucky flat-network idea
(google for KLAT2), but not in enough detail to come up with anything good.
Maybe half the nodes could talk to one of the the master nodes NICs, and
half to the other, if the master node has separate MAC addresses and doesn't
use bonding?  This could be useful if there is significant openMosix or NFS
traffic, and not just all<->all MPI traffic.

 I've also thought of having two subnets, and, with each
node having one NIC in each net. could be used for all normal
traffic (NFS, ssh, etc.), while could be exclusively for MPI.  It
might be more convenient to hack config files to get NFS or openMosix using
different IP addresses from everything else, though.

 Anyway, has anyone done anything like this, or want to expand on this idea?
I haven't thought of a useful search string to google on for this kind of
thing yet, so if anyone knows any good web pages about this, I'd love to see

#define X(x,y) x##y
Peter Cordes ;  e-mail: X(peter@cor , des.ca)

"The gods confound the man who first found out how to distinguish the hours!
 Confound him, too, who in this place set up a sundial, to cut and hack
 my day so wretchedly into small pieces!" -- Plautus, 200 BC

Reply to: