Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?

To: stan@hardwarefreak.com, debian-user@lists.debian.org
Subject: Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?
From: owens@netptc.net
Date: Sat, 23 Oct 2010 16:10:52 -0700
Message-id: <[🔎] 380-2201010623231052390@netptc.net>

>
>
>
>---- Original Message ----
>From: stan@hardwarefreak.com
>To: debian-user@lists.debian.org
>Subject: Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?
>Date: Sat, 23 Oct 2010 12:13:06 -0500
>
>>owens@netptc.net put forth on 10/22/2010 8:15 PM:
>>
>>> Actually Amdahl's Law IS a law of diminishing returns but is
>intended
>>> to be applied to hardware, not software.  The usual application is
>to
>>> compute the degree to which adding another processor increases the
>>> processing power of the system
>>> Larry
>>
>>You are is absolutely incorrect.  Amdahl's law is specific to
>algorithm
>>scalability.  It has little to do specifically with classic
>>multiprocessing.  Case in point:
>>
>>If one has a fairly heavy floating point application but it requires
>a
>>specific scalar operation be performed in the loop along with every
>FP
>>OP, say a counter increase of an integer register or similar, one
>could
>>take this application from his/er 2 GHz single core x86 processor
>>platform and run it on one processor of an NEC SX8 vector
>supercomputer
>>system, which has a wide 8 pipe vector unit--16 Gflop/s peak vs 4
>>Gflop/s peak for the x86 chip.
>>
>>Zero scalability would be achieved, even though the floating point
>>hardware is over 4 times more powerful.  Note no additional
>processors
>>were added.  We simply moved the algorithm to a machine with a
>massively
>>parallel vector FP unit.  In this case it's even more interesting
>>because the scalar unit in the SX8 runs at 1 GHz, even though the 8
>pipe
>>vector unit runs at 2 GHz.
>>
>>So, this floating point algorithm would actually run _slower_ on the
>SX8
>>due to the scalar component of the app limiting execution time due
>to
>>the 1 GHz scalar unit.  (This is typical of vector supercomputer
>>processors--Cray did the same thing for years, running the vector
>units
>>faster than the scalar units, because the vast bulk of the code run
>on
>>these systems was truly, massively, floating point specific, with
>little
>>scalar code.)
>>
>>This is the type of thing Gene Amdahl had in mind when postulating
>his
>>theory, not necessarily multiprocessing specifically, but all forms
>or
>>processing in which a portion of the algorithm could be broken up to
>run
>>in parallel, regardless of what the parallel hardware might be.  One
>of
>>the few applications that can truly be nearly infinitely
>parallelized is
>>graphics rendering.  Note I said rendering, not geometry.
>>
>>When attempting to parallelize the geometry calculations in the 3D
>>pipeline we run squarely into Amdahl's brick wall.  This is why
>>nVidia/AMD have severe problems getting multi GPU (SLI/Xfire)
>>performance to scale anywhere close to linearly.  It's impossible to
>>take the 3D scene and split the geometry calculations evenly between
>>GPUs, because vertices overlap across the portions of the frame
>buffer
>>for which each GPU is responsible.  Thus, for every overlapping
>vertice,
>>it must be sent to both GPUs adjacent to the boundary.  For this
>reason,
>>adding multiple GPUs to a system yields a vastly diminishing return
>on
>>investment.  Each additional GPU creates one more frame buffer
>boundary.
>> When you go from two screen regions to 3, you double the amount of
>>geometry processing the "middle" GPU has to perform, because he now
>has
>>two neighbor GPUs.
>>
>>The only scenario where 3 or 4 GPUs makes any kind of sense for ROI
>is
>>with multiple monitors, at insanely high screen resolutions and
>color
>>depths, with maximum AA/AF and multisampling.  These operations are
>>almost entirely raster ops, and as mentioned before, raster pixel
>>operations can be nearly linearly scaled on parallel hardware.
>>
>>Again, Amdahl's law applies to algorithm scalability, not classic
>CPU
>>multiprocessing.
>>
>>-- 
>>Stan
>>
>>
Someone once said "a text taken out of context is pretext".  The
original thread concentrated on the potential advantages of adding
CPUs to improve performance and the apparent law of diminishing
return.  I was merely supporting that with the classic law which most
certainly may be applied to coupled multiprocessing.  Disagreements
should be addressed to 
John Hennessy, author "Computer Architecture a Quantitative Approach"
(out of which I teach), in care of the office of the President,
Stanford University
Larry
>>-- 
>>To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org 
>>with a subject of "unsubscribe". Trouble? Contact
>listmaster@lists.debian.org
>>Archive: [🔎] 4CC317A2.8010603@hardwarefreak.com">http://lists.debian.org/[🔎] 4CC317A2.8010603@hardwarefreak.com
>>
>>

Reply to:

Prev by Date: Re: add downloaded program to menu or run it--how?
Next by Date: Re: Captured Video Not Producing Sound on Some Systems
Previous by thread: Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?
Next by thread: Slow DVD burning
Index(es):
- Date
- Thread