[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: need help making shell script use two CPUs/cores



Camaleón put forth on 1/10/2011 8:08 AM:
> On Sun, 09 Jan 2011 14:39:56 -0600, Stan Hoeppner wrote:
> 
>> Camaleón put forth on 1/9/2011 12:12 PM:
>>
>>> Better if you check it, but I dunno how to get the compile options for
>>> the lenny package... where is this defined, in source or diff packages?
>>
>> You're taking this thread down the wrong path.  I asked for assistance
>> writing a simple script to do what I want it to do.  Accomplishing that
>> will fix all of my "problems" WRT Imagemagick.  I didn't ask for help in
>> optimizing or fixing the Lenny i386 Imagemagick package. ;)
> 
> I read it as "how to speed up the execution of a batch script that has to 
> deal with resizing big images" and usually you get some gains if the 
> program to run was compiled to work with threads in mind.

I said lots of small images, IIRC.  Regardless, threading isn't simply turned on
with a compile time argument.  A program must be written specifically to create
master and worker threads.  Implementation is somewhat similar to exec and fork,
compared to serial programming anyway, though the IPC semantics are different.
It's a safe bet that the programs in the Lenny i386 Imagemagick package do have
the threading support.  The following likely explains why _I_ wasn't seeing the
threading.  From:

http://www.imagemagick.org/Usage/api/#speed

"For small images using the IM multi-thread capabilities will not give you any
advantage, though on a large busy server it could be detrimental. But for large
images the OpenMP multi-thread capabilities can produce a definate speed
advantage as it uses more CPU's to complete the individual image processing
operations."

It would be nice to know their definition of "small" images.


> Good. It would be nice to see the results when you finally go it working 
> the way you like ;-)

Bob's xargs suggestion got it working instantly many hours ago.  I'm not sure of
the results you refer to.  Are you looking for something like "watch top" output
for Cpu0 and Cpu1?  See for yourself.

1.  wget all the 35 .JPG files from this URL:
http://www.hardwarefreak.com/server-pics/
copy them all to a working temp dir

2.  On your dual processor, or dual core system, execute:

for k in *.JPG; do echo $k; done | xargs -I{} -P2 convert {} -resize 3072 {} &

For a quad core system, change -P2 to -P4.  You may want to wrap it with the
time command.

3.  Immediately execute top and watch Cpu0/1/2/3 in the summary area.  You'll
see pretty linear parallel scaling of the convert processes.  Also note memory
consumption doubles with each doubling of the process count.

Now, to compare the "xargs -P" parallel process performance to standard serial
performance, clear the temp dir and copy the original files over again.  Now
execute:

for k in *.JPG; do convert $k -resize 3072 $k; done &

and launch top.  You'll see only a single convert process running.  Again, you
can wrap this with the time command if you like to compare total run times.
What you'll find is nearly linear scaling as the number of convert processes is
doubled, up to the point #processes equals #cores.  Running more processes than
cores merely eats memory wastefully and increases total processing time.

Linux is pretty efficient at scheduling multiple processes among cores in
multiprocessor and/or multi-core systems and achieving near linear performance
scaling.  This is one reason why "fork and forget" is such a popular method used
for parallel programming.  All you have to do is fork many children and the
kernel takes care of scheduling the processes to run simultaneously.

-- 
Stan


Reply to: