[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: zero-copy tcp in 2.4 and MPI?



Camm Maguire wrote:
> 
> Greetings!  I've read the stuff on Tux/X15 in kernel 2.4 with
> interest.  Apparently there is some enhanced 2.4 kernel tcp interface,
> using either TCP_CORK or sendfile(), which I'm not exactly sure, to
> effectively get zero-copy tcp/ip networking, and a great performance
> boost.  I think that if someone had a copy of the source to X15, that
> might be the quickest way to find out what's going on.  Can't find it
> now -- seems to be a commercial product at this point.  My question --
> shouldn't the MPI implementations be able to benefit from this?

Yes, the trick is making the incoming buffers (transparently) visible
to user processes without going in and out of kernel & copying message
buffers. [*] Even the new versions of windows are going to have this so
it's a pity that we don't have it in a standard way yet :) Actually,
some of the message passing environments on linux do perform similar
tricks but any tcp server would benefit from the implementation that
you describe, including MPI implementations on linux that employ user
space daemons.

Any accurate information on fast message passing implementations on linux
would be greatly appreciated.

Thanks,

[*] That was a very bad explanation I guess ;) The truth is that you normally
first allocate a buffer in user space. Then if the kernel can get to
send that very message without (using the CPU for) copying any array of bytes
it's the best. Typically this would involve some interaction with the device
driver so the network device driver would have to be smart enough for this
fast send routine to work. And doing all that, you shouldn't be causing
any more context switches because that would mean latency which is
not good for HPC. I'm not very familiar with the linux network protocol
stack but from the driver codes I've seen I recognize that this involves
a lot of hacking of truly intertwined code. Making network buffers visible
through all those layers might require some significant amount of code (and
a lot of backspaces).

-- 
Eray Ozkural (exa) <erayo@cs.bilkent.edu.tr>, 
Comp. Sci. Dept., Bilkent University, Ankara
www: http://www.cs.bilkent.edu.tr/~erayo
GPG public key fingerprint: 360C 852F 88B0 A745 F31B  EA0F 7C07 AE16 874D 539C



Reply to: