[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: dual-core amd question



On Thu, Jan 18, 2007 at 11:12:51AM -0800, Francesco Pietra wrote:
> Question about MPQC job, with command for
> multiprocessor with POSIX threads:
> 
>  -thread "PthreadThreadGrp>:(num_threads = 4)"
> 
> Hardware: Tyan S2895 K8WE with all eigth memory
> modules; two dual-core opteron cpus. OS: debian amd64
> etch.

A dual core opteron has two complete opteron processors in one chip.
With two chips you have a total of 4 cores so you have 4 CPUs.

> The output of a MPQC job says (than the job completes
> successfully):
> 
> Using ShmMessageGrp for message passing (number of
> nodes = 4).
>   Using PthreadThreadGrp for threading (number of
> threads = 1).
>   Using ShmMemoryGrp for distributed shared memory.
>   Total number of processors = 4
> 
> Why 4 processors? Moreover, with such a system is that
> command optimized or should MPI used instead? In other
> words, is my arrangement "shared memory" or should be
> considered a cluster? 
> 
> That because someone has recently warned me that
> "dual-opteron is not shared memory [I knew that of
> course].  Each cpu has its own memory, and they can
> access each other's. If your program uses MPI then
> that's the best way of having it."

On most opteron systems, all the CPUs are connected to each other, and
the ram connected to various CPUs, and the extra time to access memory
attached to another cpu is only bus cycle per extra cpu, so in your case
at most one extra cycle.  No message passing algorithm will ever get
anywhere close to the performance of just accessing the memory on a
system like yours so shared memory between threads/processes is as fast
as it can get in your case.

> I had no doubt, until now, that the above arrangement
> of memory modules makes my system shared memory.
> Cannot try MPI because the progran is not compiled for
> clusters.

You do not have a cluster.  Your CPUs are as tightly linked as they can
realisticly be.  As far as I recall the interconnect between the two
CPUs runs at 4GB/s each way, which should be plenty for passing data
between threads.  I don't think the memory directly connected to each
cpu is that much faster than that.

--
Len Sorensen



Reply to: