Re: need help making shell script use two CPUs/cores
Karl Vogel put forth on 1/9/2011 6:04 PM:
>>> On Sun, 09 Jan 2011 10:05:43 -0600, 
>>> Stan Hoeppner <stan@hardwarefreak.com> said:
> 
> S> #! /bin/sh
> S> for k in $(ls *.JPG); do convert $k -resize 1024 $k; done
> 
>    Someone was ragging on you to let the shell do the file expansion.  I
>    like your way better because most scripting shells aren't smart enough
>    to realize that when there aren't any .JPG files, I don't want the
>    script to echo '*.JPG' as if that's actually useful.
This doesn't matter to me as I only use this script on a single temp directory
after I dump the camera files into it.  The camera, a Fujifilm FinePix A820
8.3MP, saves its files in all upper case.
> S> I use the above script to batch re-size digital camera photos after I
> S> dump them to my web server.  It takes a very long time with lots of new
> S> photos as the server is fairly old, even though it is a 2-way SMP,
> S> because the script only runs one convert process at a time serially,
> S> only taking advantage of one CPU.
> 
>    First things first: are you absolutely certain that running two parallel
>    jobs will exercise both CPUs?  I've seen SMP systems that don't exactly
>    live up to truth-in-advertising.  If you stuff two "convert" jobs in the
>    background and then run "top" (or the moral equivalent) do you SEE both
>    CPUs being worked?
See my response to Bob.  And see Bob's response to you. :)  The issue you
describe was resolved with a few patches many years ago, and only reared its
ugly head on processors with SMT (HT) enabled.  The kernel scheduler work lagged
behind the hardware releases of IBM's SMT and Intel's HT. The chips were on the
market a while before regular distro release cycles caught up.  So early
adopters of SMT chips saw the problem you describe.  As Bob noted, in most
situations, simply turning SMT off fixed the problem instantly.  For those who
don't know the acronyms, SMT stands for "Simultaneous Multi-threading", which is
the textbook term for this technology.  Intel gave their SMT implementation a
catchy marketing name, HyperThreading, as they seem to do with every product, sadly.
>    Second: do you have "taskset" installed?  If the work isn't being
>    divided up the way you like, you can bind a process to a desired core:
>    http://planet.admon.org/how-to-bind-a-certain-process-to-specified-core/
cpusets (see also cpumemsets) which is the kernel feature that tasksel
manipulates, is overkill for managing process scheduling on a 2-way box, and
wouldn't yield much, if any, benefit.  In fact, if I were to attempt using it
with my piddly workloads, I'd likely be far less efficient at manually
scheduling tasks than the kernel.  In fact, I can guarantee you of this. :)
>    And last: if you're not using something like LVM, can you do anything to
>    make sure you're not hitting the same disk?  If all your new photos are
>    on the same drive, any CPU savings you get from parallel processing will
>    probably be erased by disk contention.  Better yet, do you have enough
>    memory to do the processing on a RAM-backed filesystem?
Apparently you've never used Imagemagick's convert utility, or any other image
manipulation tools, or not on an older ~550MHz machine with tiny L2 cache (by
today's standards).  Image manipulation programs are always CPU bound, rarely,
if ever, IO bound.  I'd say "never" but I'm sure there is a rare corner case out
there somewhere.
It's odd isn't it, that I have pretty intimate knowledge of the things above,
yet am handicapped WRT shell scripting?  Nobody knows everything, and I'm sure
glad lists such as debian-users exist to fill in the knowledge gaps.  :)
-- 
Stan
Reply to: