[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: need help making shell script use two CPUs/cores



Stan Hoeppner wrote:
> Bob Proulx put forth:
> > Here is some raw data from another test using GraphicsMagic from Debian
> > Sid on an Intel Core2 Quad CPU Q9400 @ 2.66GHz.
> > 
> >     #CPUs  real   user  sys
> >     1 ... 32.17 100.15 2.29
> >     2 ... 28.02 102.09 2.25
> >     3 ... 26.96 101.41 2.02
> >     4 ... 26.18  99.85 2.10
> >     5 ... 26.03  98.58 2.27
> >     6 ... 27.07  97.32 2.17
> >     7 ... 27.74 100.09 2.03
> >     8 ... 26.76  97.83 1.99
> >     9 ... 27.24  97.31 2.88
> >    10 ... 26.27  99.05 2.76
> >    11 ... 26.35  99.30 1.84
> >    12 ... 25.91  97.63 2.08
> 
> So, I'm not understanding how we have a quad core CPU with 12 CPUs.
> Is "#CPUs" here your xargs "-P" argument in the script you posted in
> response to my question that started this thread?

Sorry.  Yes I made a mistake in posting those headings.  Yes it was
the xargs -P parallelization argument listed where I said #CPUs.
Running a different number of conversion processes in parallel.

> Why bother going up to 12 processes with a quad core chip?  Anything
> over 4 processes/threads won't gain you anything, as your results
> above demonstrate.

I went to 12 because it would demonstrate the behavior three times
past the number of cores.  If I had only a dual core I would have only
chosen to go to 6.  But I would have gone to 6 for one core too since
three doesn't generate a smooth enough scatter plot for me.  But I
didn't want to spend too much time analyzing the problem to set up a
statistically designed experiment.  I just wanted to quickly perform
the test.  So plucked in 12 there and moved on.  Surely that would be
enough.  I didn't think I would need to rigorously defend that quick
choice against a panel.

At some point by doing more parallelism things will actually be slowed
down by it.  I didn't reach that point.

> > And the same thing using ImageMagick on the same system.
> > 
> >     #CPUs  real  user  sys
> >     1 ... 24.69 62.60 2.87
> >     2 ... 19.28 63.17 2.50
> >     3 ... 17.82 60.34 2.65
> >     4 ... 17.48 58.86 2.55
> >     5 ... 16.60 58.11 2.34
> >     6 ... 15.85 58.03 2.38
> >     7 ... 15.61 58.09 2.44
> >     8 ... 15.36 57.68 2.48
> >     9 ... 15.48 57.76 2.38
> >    10 ... 15.38 57.76 2.28
> >    11 ... 15.36 57.97 2.27
> >    12 ... 15.73 58.76 2.17
> > 
> > Watching the individual cpu load I observe that while the 1 cpu case
> > did consume one cpu fully that the other three were also showing quite
> > a bit of activity too.  
> 
> Imagemagick will use threads on larger images.  To keep it from threading, in
> order for your testing to make more sense, use smaller images.

I couldn't find anything in the ImageMagick documentation that
described its threading behavior.  Where did I miss that useful
information?

For images I used your set of "benchmark" photos that we have been
discussing in this thread.

> > three running all four cpus were looking pretty much 100% consumed.  I
> > was timing all of the shell's for loop, the xargs and the convert
> > processes all together.
> 
> If you are converting images large enough that the threading kicks
> in, there's little reason to use multiple processes at that point.
> We'd already discussed this.  Were you simply trying to confirm that
> with these tests?

I expected that on this machine that the memory backplane wouldn't
have enough memory bandwidth to support all four processors.  I expect
it to brown out before getting to four.  Having a quad-core sounds
great but just having four cores doesn't mean all of them can be used
at the same time to advantage.  I expect that the "extra" cores will
get starved.  And so the curve will drop off sooner than four.

> > I also tried running this same test on some slower hardware.  I have
> > gotten spoiled by the faster machine.  The benchmark is still running
> > on my slower machines. :-)  I am not going to wait for it to finish.
> 
> What are the CPU specs of this older machine?

I tested this on an Intel Celeron 2.4GHz machine with 2.5G ram.
Unfortunately I see now that I have lost the saved data from that
test.  (Drat!  I know what I did but I would need to run the test
again to regenerate.)  But an entire run to six parallel conversions
there as I recall took over thirty minutes of total time to complete
and as I recall worked out to being twenty times slower.  Don't hold
me to those numbers as I would need to capture the actual data again
to be sure and I don't want to spend the time to do that.  But it was
slower, much slower.  (This is actually my main web server and
normally does image conversions when I upload photos.  This
information is probably going to motivate me to set up a task queue to
speed up my image conversions there.)

Bob

Attachment: signature.asc
Description: Digital signature


Reply to: