[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: need help making shell script use two CPUs/cores



Camaleón put forth on 1/11/2011 9:38 AM:

> I supposed you wouldn't care much in getting a script to run faster with 
> all the available core "occupied" if you had a modern (<4 years) cpu and 
> plenty of speedy ram because the routine you wanted to run it should not 
> take many time... unless you were going to process "thousand" of 
> images :-)

That's a bit ironic.  You're suggesting the solution is to upgrade to a new
system with a faster processor and memory.  However, all the newer processors
have 2, 4, 6, 8, or 12 cores.  So upgrading simply for single process throughput
would waste all the other cores, which was the exact situation I found myself in.

The ironic part is that parallelizing the script to maximize performance on my
system will also do the same for the newer chips, but to an even greater degree
on those with 4, 6, 8, or 12 cores.  Due to the fact that convert doesn't eat
100% of a core's time during its run, and the idle time in between one process
finishing and xargs starting another, one could probably run 16-18 parallel
convert processes on a 12 core Magny Cours with this script before run times
stop decreasing.

The script works.  It cut my run time by over 50%.  I'm happy.  As I said, this
system's processing power is complete overkill 99% of the time.  It works
beautifully with pretty much everything I've thrown at it, for 8 years now.  If
I _really_ wanted to maximize the speed of this photo resizing task I'd install
Win32 ImageMagick on my 2GHz Athlon XP workstation with dual channel memory
nForce2 mobo, convert them on the workstation, and copy them to the server.

However, absolute maximum performance of this task was not, and is not my goal.
 My goal was to make use of the second CPU, which was sitting idle in the
server, to speed up the task completion.  That goal was accomplished. :)

>>> Running more processes than real cores seems fine, did you try it?
>>
>> Define "fine".  
> 
> Fine = system not hogging all resources.

I had run 4 (2 core machine) and run time was a few seconds faster than 2
processes, 3 seconds IIRC.  Running 8 processes pushed the system into swap and
run time increased dramatically.  Given that 4 processes only have a few seconds
faster than two, yet consumed twice as much memory, the best overall number of
processes to run on this system is two.

> I didn't know the meaning of that "SUT" term... 

I like using it.  It's good short hand.  I wish more people used it, or were
familiar with it, so I wouldn't have to define it every time I use it. :)

> The test was run in a 
> laptop (Toshiba Tecra A7) with an Intel Core Duo T2400 (in brief, 2M 
> Cache, 1.83 GHz, 667 MHz FSB, full specs¹) and 4 GiB of ram (DDR2)

> VM is Virtualbox (4.0) with Windows XP Pro as host and Debian Squeeze as 
> guest. VM was setup to use the 2 cores and 1.5 GiB of system ram. Disk 
> controller is emulated via ich6.

I wonder how much faster convert it would run on bare metal on that laptop.

>> Are you "new" to the concept of parallel processing and what CPU process
>> scheduling is?
> 
> No... I guess this is quite similar to the way most of the daemons do 
> when running in background and launch several instances (like "amavisd-
> new" does) but I didn't think there was a direct relation in the number 
> of the running daemons/processes and the cores available in the CPU, I 
> mean, I thought the kernel would automatically handle all the resources 
> available the best it can, regardless of the number of cores in use.

This is correct.  But the kernel can't take a single process make it run across
all cores, maximizing performance.  For this, the process must be written to
create threads, forks, or children.  The kernel will then run each of these on a
different processor core.  This is why Imagemagick convert needs to be
parallelized when batching many photos.  If you don't parallelize it, the kernel
can't schedule it across all cores.  The docs say it will use threads but only
with "large" files.  Apparently 8.2 megapixel JPGs aren't "large", as the
threading has never kicked in for me.  By using xargs for parallelization, we
create x number of concurrent processes.  The kernel then schedules each one on
a different cpu core.

-- 
Stan


Reply to: