[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?

owens@netptc.net put forth on 10/22/2010 8:15 PM:

> Actually Amdahl's Law IS a law of diminishing returns but is intended
> to be applied to hardware, not software.  The usual application is to
> compute the degree to which adding another processor increases the
> processing power of the system
> Larry

You are is absolutely incorrect.  Amdahl's law is specific to algorithm
scalability.  It has little to do specifically with classic
multiprocessing.  Case in point:

If one has a fairly heavy floating point application but it requires a
specific scalar operation be performed in the loop along with every FP
OP, say a counter increase of an integer register or similar, one could
take this application from his/er 2 GHz single core x86 processor
platform and run it on one processor of an NEC SX8 vector supercomputer
system, which has a wide 8 pipe vector unit--16 Gflop/s peak vs 4
Gflop/s peak for the x86 chip.

Zero scalability would be achieved, even though the floating point
hardware is over 4 times more powerful.  Note no additional processors
were added.  We simply moved the algorithm to a machine with a massively
parallel vector FP unit.  In this case it's even more interesting
because the scalar unit in the SX8 runs at 1 GHz, even though the 8 pipe
vector unit runs at 2 GHz.

So, this floating point algorithm would actually run _slower_ on the SX8
due to the scalar component of the app limiting execution time due to
the 1 GHz scalar unit.  (This is typical of vector supercomputer
processors--Cray did the same thing for years, running the vector units
faster than the scalar units, because the vast bulk of the code run on
these systems was truly, massively, floating point specific, with little
scalar code.)

This is the type of thing Gene Amdahl had in mind when postulating his
theory, not necessarily multiprocessing specifically, but all forms or
processing in which a portion of the algorithm could be broken up to run
in parallel, regardless of what the parallel hardware might be.  One of
the few applications that can truly be nearly infinitely parallelized is
graphics rendering.  Note I said rendering, not geometry.

When attempting to parallelize the geometry calculations in the 3D
pipeline we run squarely into Amdahl's brick wall.  This is why
nVidia/AMD have severe problems getting multi GPU (SLI/Xfire)
performance to scale anywhere close to linearly.  It's impossible to
take the 3D scene and split the geometry calculations evenly between
GPUs, because vertices overlap across the portions of the frame buffer
for which each GPU is responsible.  Thus, for every overlapping vertice,
it must be sent to both GPUs adjacent to the boundary.  For this reason,
adding multiple GPUs to a system yields a vastly diminishing return on
investment.  Each additional GPU creates one more frame buffer boundary.
 When you go from two screen regions to 3, you double the amount of
geometry processing the "middle" GPU has to perform, because he now has
two neighbor GPUs.

The only scenario where 3 or 4 GPUs makes any kind of sense for ROI is
with multiple monitors, at insanely high screen resolutions and color
depths, with maximum AA/AF and multisampling.  These operations are
almost entirely raster ops, and as mentioned before, raster pixel
operations can be nearly linearly scaled on parallel hardware.

Again, Amdahl's law applies to algorithm scalability, not classic CPU


Reply to: