Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?
Ron Johnson put forth on 10/22/2010 2:00 AM:
> On 10/22/2010 12:53 AM, Arthur Machlas wrote:
>> On Thu, Oct 21, 2010 at 8:15 PM, Andrew Reid<firstname.lastname@example.org>
>>> But I'm curious if anyone on the list knows the rationale for
>>> distributing kernels with this set to 32. Is that just a
>>> reasonable number that's never been updated? Or is there some
>>> complication that arises after 32 cores, and should I be more
>>> careful about tuning other parameters?
>> I've always set the number of cores to exactly how many I have x2 when
>> I roll my own, which on my puny systems is either 4 or 8. I seem to
>> recall reading that there is a slight performance hit for every core
>> you support.
> Correct. The amount of effort needed for cross-CPU communication, cache
> coherency and OS process coordination increases much more than linearly
> as you add CPUs.
All of these things but the scheduler, what you call "process
coordination", are invisible to the kernel for the most part and are
irrelevant to the discussion of CONFIG_NR_CPUS.
> Crossbar communication (introduced first, I think, by DEC/Compaq in
> 2001) eliminated a lot of the latency in multi-CPU communications which
> plagues bus-based systems.
Crossbar bus controllers have been around for over 30 years, first
implemented by IBM in its mainframes in the late 70s IIRC. Many
RISC/UNIX systems in the 90s implemented crossbar controllers, including
Data General, HP, SGI, SUN, Unisys, etc.
You refer to the Alpha 21364 processor introduced in the
ES47/GS80/GS1280, which did not implement a crossbar for inter-socket
communication. The 21364 implemented a NUMA interconnect based on a
proprietary directory protocol for multiprocessor cache coherence.
These circuits in NUMA machines are typically called "routers", and,
functionally, replace the crossbar of yore.
> AMD used a similar mesh in it's dual-core CPUs (not surprising, since
> many DEC engineer went to AMD). Harder to design, but much faster.
You make it sound as if AMD _chose_ this design _over_ a shared bus.
There never was such a choice to be made. Once you implement multiple
cores on a single die you no longer have the option of using a shared
bus such as GTL as the drive voltage is 3.3v, over double the voltages
used within the die. By definition buses are _external_ to ICs, and
connect ICs to one another. Buses aren't used within a die. Discrete
data paths are.
> Intel's first (and 2nd?) gen multi-core machines were bus-based; easier
> to design, quicker to get to market, but a lot slower.
This is because they weren't multi-core chips, but Multi Chip Modules,
or MCMs: http://en.wikipedia.org/wiki/Multi-Chip_Module Communication
between ICs within an MCM is external communication, thus a bus can be
used, as well as NUMA which IBM uses in its pSeries (Power5/6/7) MCMs
and Cray used on the X1 and X1E.
> (OP's machine is certainly NUMA, where communication between cores on a
> chip is much faster than communication with cores on a different chip.)
At least you got this part correct Ron. ;)
Back to the question of the thread, the answer, as someone else already
stated, is that the only downside to setting CONFIG_NR_CPUS= to a value
way above the number of physical cores in the machine is kernel
footprint, but it's not very large given the memories of today's
machines. Adding netfilter support will bloat the kernel footprint far
more than setting CONFIG_NR_CPUS=256 when you only have 48 cores in the box.