[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: need help making shell script use two CPUs/cores



Camaleón put forth on 1/10/2011 2:11 PM:

> Did'nt you run any test? Okay... (now downloading the sample images)

Yes, or course.  I just didn't capture results to file.  And it's usually better
if people see their own results instead of someone else' copy/paste.

>> 2.  On your dual processor, or dual core system, execute:
>>
>> for k in *.JPG; do echo $k; done | xargs -I{} -P2 convert {} -resize
>> 3072 {} &
> 
> I used a VM to get the closest environment as you seem to have (a low 
> resource machine) and the above command (timed) gives:

I'm not sure what you mean by resources in this context.  My box has plenty of
resources for the task we're discussing.  Each convert process, IIRC, was using
80MB on my system.  Only two can run simultaneously.  So why queue up 4 or more
processes?  That just eats memory uselessly for zero decrease in total run time.

> real	1m44.038s
> user	2m5.420s
> sys	1m17.561s
> 
> It uses 2 "convert" proccesses so the files are being run on pairs.
> 
> And you can even get the job done faster if using -P8:
> 
> real	1m25.255s
> user	2m1.792s
> sys	0m43.563s

That's an unexpected result.  I would think running #cores*2^x with an
increasing x value would start yielding lower total run times within a few
multiples of #cores.

> No need to have a quad core with HT. Nice :-)

Use some of the other convert options on large files and you'll want those extra
two real cores. ;)

>> Now, to compare the "xargs -P" parallel process performance to standard
>> serial performance, clear the temp dir and copy the original files over
>> again.  Now execute:
>>
>> for k in *.JPG; do convert $k -resize 3072 $k; done &
> 
> This gives:
> 
> real	2m30.007s
> user	2m11.908s
> sys	1m42.634s
> 
> Which is ~0.46s. of plus delay. Not that bad.

You mean 46s not 0.46s.  104s vs 150s = 44% decrease in run time.  This _should_
be closer to a 90-100% decrease in a "perfect world".  In this case there is
insufficient memory bandwidth to feed all the processors.

I just made two runs on the same set of photos but downsized them to 800x600 to
keep the run time down.  (I had you upscale them to 3072x2048 as your CPUs are
much newer)

$ time for k in *.JPG; do convert $k -resize 800 $k; done

real    1m16.542s
user    1m11.872s
sys     0m4.104s

$ time for k in *.JPG; do echo $k; done | xargs -I{} -P2 convert {} -resize 800 {}

real    0m41.188s
user    1m14.837s
sys     0m4.812s

41s vs 77s = 53% decrease in run time.  In this case there is insufficient
memory bandwidth as well.  The Intel BX chipset supports a single channel of
PC100 memory for a raw bandwidth of 800MB/s.  Image manipulation programs will
eat all available memory b/w.  On my system, running two such processes allows
~400MB/s to each processor socket, starving the convert program of memory access.

To get close to _linear_ scaling in this scenario, one would need something like
an 8 core AMD Magny Cours system with quad memory channels, or whatever the
Intel platform is with quad channels.  One would run with xargs -P2, allowing
each process ~12GB/s of memory bandwidth.  This should yield a 90-100% decrease
in run time.

> Running more processes than real cores seems fine, did you try it?

Define "fine".  Please post the specs of your SUT, both CPU/mem subsystem and OS
environment details (what hypervisor and guest).  (SUT is IBM speak for System
Under Test).

>> Linux is pretty efficient at scheduling multiple processes among cores
>> in multiprocessor and/or multi-core systems and achieving near linear
>> performance scaling.  This is one reason why "fork and forget" is such a
>> popular method used for parallel programming.  All you have to do is
>> fork many children and the kernel takes care of scheduling the processes
>> to run simultaneously.
> 
> Yep. It handles the proccesses quite nice.

Are you "new" to the concept of parallel processing and what CPU process
scheduling is?

-- 
Stan


Reply to: