Re: need help making shell script use two CPUs/cores
Bob Proulx put forth on 1/24/2011 12:21 PM:
> Stan Hoeppner wrote:
>> Why bother going up to 12 processes with a quad core chip? Anything
>> over 4 processes/threads won't gain you anything, as your results
>> above demonstrate.
> I went to 12 because it would demonstrate the behavior three times
> past the number of cores. If I had only a dual core I would have only
> chosen to go to 6. But I would have gone to 6 for one core too since
> three doesn't generate a smooth enough scatter plot for me. But I
> didn't want to spend too much time analyzing the problem to set up a
> statistically designed experiment. I just wanted to quickly perform
> the test. So plucked in 12 there and moved on. Surely that would be
> enough. I didn't think I would need to rigorously defend that quick
> choice against a panel.
But you'll run out of memory bandwidth before you hit 4 processes, especially if
your 4-way chip has no L3 cache, such as the Athlon II x4 chips. Going all the
way out to 12 processes seems a bit silly. Even with something like one of
Intel's Core i7s with a monster L3 cache, you'll exhaust your memory and cache
b/w well before you have (#cores*1.5) processes.
> At some point by doing more parallelism things will actually be slowed
> down by it. I didn't reach that point.
This will probably only occur if you run out of memory and have to swap. The
overhead of the Linux task scheduler is tiny--we're talking microseconds per
task switch. And as I mentioned, you're already thrashing your caches at 4
processes, so beyond that point everything is purely memory b/w constrained.
This bandwidth is finite, static. So no matter how many processes you run
(unless you run more processes than you have images) you probably won't see any
slowdown past 4 processes.
>> Imagemagick will use threads on larger images. To keep it from threading, in
>> order for your testing to make more sense, use smaller images.
> I couldn't find anything in the ImageMagick documentation that
> described its threading behavior. Where did I miss that useful
> For images I used your set of "benchmark" photos that we have been
> discussing in this thread.
Hmmm. If you were seeing threading with a single process with those images,
this would lead me to believe the Lenny Imagemagick version doesn't support
threads. You're running the Squeeze package, correct? I'm running:
$ identify -version
Version: ImageMagick 6.3.7 11/17/10 Q16 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2008 ImageMagick Studio LLC
According to the docs I should see something like:
$ identify -version
Features: OpenMP OpenCL
but I don't.
> I expected that on this machine that the memory backplane wouldn't
> have enough memory bandwidth to support all four processors. I expect
None of them do. Recall when the first socket 939 AMD chips hit the market,
with all mobos having dual channel memory as the controller was on the CPU? One
core with dual memory channels, and many applications saw huge performance
gains. Now we have 4 CPUs on two memory channels. If not for caches, you'd see
no speedup past 2 Imagemagick processes. Which is pretty much the behavior
identified by another OP with an Athlon II x4 system--almost zeo speedup from 2
to 4 processes.
> it to brown out before getting to four. Having a quad-core sounds
> great but just having four cores doesn't mean all of them can be used
> at the same time to advantage. I expect that the "extra" cores will
> get starved. And so the curve will drop off sooner than four.
This is always the case. No multicore CPU has enough memory channels to keep
all cores fed on a byte/OP basis. This is no secret. It's been well discussed
for many years now.
>>> I also tried running this same test on some slower hardware. I have
>>> gotten spoiled by the faster machine. The benchmark is still running
>>> on my slower machines. :-) I am not going to wait for it to finish.
>> What are the CPU specs of this older machine?
> I tested this on an Intel Celeron 2.4GHz machine with 2.5G ram.
My test server is a dual Celeron 550 with only 384MB and it doesn't take
anywhere near 30 minutes for that set of test images. IIRC it only took a few