Re: need help making shell script use two CPUs/cores
On Tue, 11 Jan 2011 07:13:47 -0600, Stan Hoeppner wrote:
> Camaleón put forth on 1/10/2011 2:11 PM:
>
>> I used a VM to get the closest environment as you seem to have (a low
>> resource machine) and the above command (timed) gives:
>
> I'm not sure what you mean by resources in this context. My box has
> plenty of resources for the task we're discussing. Each convert
> process, IIRC, was using 80MB on my system. Only two can run
> simultaneously. So why queue up 4 or more processes? That just eats
> memory uselessly for zero decrease in total run time.
I supposed you wouldn't care much in getting a script to run faster with
all the available core "occupied" if you had a modern (<4 years) cpu and
plenty of speedy ram because the routine you wanted to run it should not
take many time... unless you were going to process "thousand" of
images :-)
(...)
> I just made two runs on the same set of photos but downsized them to
> 800x600 to keep the run time down. (I had you upscale them to 3072x2048
> as your CPUs are much newer)
>
> $ time for k in *.JPG; do convert $k -resize 800 $k; done
>
> real 1m16.542s
> user 1m11.872s
> sys 0m4.104s
>
> $ time for k in *.JPG; do echo $k; done | xargs -I{} -P2 convert {}
> -resize 800 {}
>
> real 0m41.188s
> user 1m14.837s
> sys 0m4.812s
>
> 41s vs 77s = 53% decrease in run time. In this case there is
> insufficient memory bandwidth as well. The Intel BX chipset supports a
> single channel of PC100 memory for a raw bandwidth of 800MB/s. Image
> manipulation programs will eat all available memory b/w. On my system,
> running two such processes allows ~400MB/s to each processor socket,
> starving the convert program of memory access.
>
> To get close to _linear_ scaling in this scenario, one would need
> something like an 8 core AMD Magny Cours system with quad memory
> channels, or whatever the Intel platform is with quad channels. One
> would run with xargs -P2, allowing each process ~12GB/s of memory
> bandwidth. This should yield a 90-100% decrease in run time.
>
>> Running more processes than real cores seems fine, did you try it?
>
> Define "fine".
Fine = system not hogging all resources.
> Please post the specs of your SUT, both CPU/mem
> subsystem and OS environment details (what hypervisor and guest). (SUT
> is IBM speak for System Under Test).
I didn't know the meaning of that "SUT" term... The test was run in a
laptop (Toshiba Tecra A7) with an Intel Core Duo T2400 (in brief, 2M
Cache, 1.83 GHz, 667 MHz FSB, full specs¹) and 4 GiB of ram (DDR2).
VM is Virtualbox (4.0) with Windows XP Pro as host and Debian Squeeze as
guest. VM was setup to use the 2 cores and 1.5 GiB of system ram. Disk
controller is emulated via ich6.
>>> Linux is pretty efficient at scheduling multiple processes among cores
>>> in multiprocessor and/or multi-core systems and achieving near linear
>>> performance scaling. This is one reason why "fork and forget" is such
>>> a popular method used for parallel programming. All you have to do is
>>> fork many children and the kernel takes care of scheduling the
>>> processes to run simultaneously.
>>
>> Yep. It handles the proccesses quite nice.
>
> Are you "new" to the concept of parallel processing and what CPU process
> scheduling is?
No... I guess this is quite similar to the way most of the daemons do
when running in background and launch several instances (like "amavisd-
new" does) but I didn't think there was a direct relation in the number
of the running daemons/processes and the cores available in the CPU, I
mean, I thought the kernel would automatically handle all the resources
available the best it can, regardless of the number of cores in use.
¹http://ark.intel.com/Product.aspx?id=27235
Greetings,
--
Camaleón
Reply to: