Re: need help making shell script use two CPUs/cores
On Tue, 11 Jan 2011 07:13:47 -0600, Stan Hoeppner wrote:
> Camaleón put forth on 1/10/2011 2:11 PM:
> 
>> I used a VM to get the closest environment as you seem to have (a low
>> resource machine) and the above command (timed) gives:
> 
> I'm not sure what you mean by resources in this context.  My box has
> plenty of resources for the task we're discussing.  Each convert
> process, IIRC, was using 80MB on my system.  Only two can run
> simultaneously.  So why queue up 4 or more processes?  That just eats
> memory uselessly for zero decrease in total run time.
I supposed you wouldn't care much in getting a script to run faster with 
all the available core "occupied" if you had a modern (<4 years) cpu and 
plenty of speedy ram because the routine you wanted to run it should not 
take many time... unless you were going to process "thousand" of 
images :-)
(...)
> I just made two runs on the same set of photos but downsized them to
> 800x600 to keep the run time down.  (I had you upscale them to 3072x2048
> as your CPUs are much newer)
> 
> $ time for k in *.JPG; do convert $k -resize 800 $k; done
> 
> real    1m16.542s
> user    1m11.872s
> sys     0m4.104s
> 
> $ time for k in *.JPG; do echo $k; done | xargs -I{} -P2 convert {}
> -resize 800 {}
> 
> real    0m41.188s
> user    1m14.837s
> sys     0m4.812s
> 
> 41s vs 77s = 53% decrease in run time.  In this case there is
> insufficient memory bandwidth as well.  The Intel BX chipset supports a
> single channel of PC100 memory for a raw bandwidth of 800MB/s.  Image
> manipulation programs will eat all available memory b/w.  On my system,
> running two such processes allows ~400MB/s to each processor socket,
> starving the convert program of memory access.
> 
> To get close to _linear_ scaling in this scenario, one would need
> something like an 8 core AMD Magny Cours system with quad memory
> channels, or whatever the Intel platform is with quad channels.  One
> would run with xargs -P2, allowing each process ~12GB/s of memory
> bandwidth.  This should yield a 90-100% decrease in run time.
> 
>> Running more processes than real cores seems fine, did you try it?
> 
> Define "fine".  
Fine = system not hogging all resources.
> Please post the specs of your SUT, both CPU/mem
> subsystem and OS environment details (what hypervisor and guest).  (SUT
> is IBM speak for System Under Test).
I didn't know the meaning of that "SUT" term... The test was run in a 
laptop (Toshiba Tecra A7) with an Intel Core Duo T2400 (in brief, 2M 
Cache, 1.83 GHz, 667 MHz FSB, full specs¹) and 4 GiB of ram (DDR2).
VM is Virtualbox (4.0) with Windows XP Pro as host and Debian Squeeze as 
guest. VM was setup to use the 2 cores and 1.5 GiB of system ram. Disk 
controller is emulated via ich6.
>>> Linux is pretty efficient at scheduling multiple processes among cores
>>> in multiprocessor and/or multi-core systems and achieving near linear
>>> performance scaling.  This is one reason why "fork and forget" is such
>>> a popular method used for parallel programming.  All you have to do is
>>> fork many children and the kernel takes care of scheduling the
>>> processes to run simultaneously.
>> 
>> Yep. It handles the proccesses quite nice.
> 
> Are you "new" to the concept of parallel processing and what CPU process
> scheduling is?
No... I guess this is quite similar to the way most of the daemons do 
when running in background and launch several instances (like "amavisd-
new" does) but I didn't think there was a direct relation in the number 
of the running daemons/processes and the cores available in the CPU, I 
mean, I thought the kernel would automatically handle all the resources 
available the best it can, regardless of the number of cores in use.
¹http://ark.intel.com/Product.aspx?id=27235
Greetings,
-- 
Camaleón
Reply to: