RE: Slow Script

To: <debian-user@lists.debian.org>
Subject: RE: Slow Script
From: "Stackpole, Chris" <CStackpole@barbnet.com>
Date: Tue, 3 Feb 2009 12:27:26 -0600
Message-id: <[🔎] CAA2386151CF504E958CEFB0801DFC560116DCF2@mx-mailstore.BassCompanies.Com>
In-reply-to: <[🔎] 20090203172437.GC1432@sherohman.org>
References: <[🔎] 03a801c98622$ee8874a0$cb995de0$@es> <[🔎] 20090203172437.GC1432@sherohman.org>

> From: Dave Sherohman [mailto:dave@sherohman.org]
> Sent: Tuesday, February 03, 2009 11:25 AM
> Subject: Re: Slow Script
> 
> On Tue, Feb 03, 2009 at 06:14:48PM +0100, Gorka wrote:
> > Hi! I've got a perl script with this for:
> >
> >   for (my $j=0;$j<=$#fichero1;$j++)
> >   {
> >     if (@fichero1[$j] eq $valor1)
> >     {
> >       $token = 1;
> >     }
> >   }
> >
> > The problem is that fichero1 has 32 millions of records and moreover
> I've
> > got to repeat this for several millions times, so this way it would
take
> > years to finish.
> > Does anybody know a way to optimize this script? Is there any other
> linux
> > programing language I could make this more quickly whith?
> > Thank you!
> 
> Although the Perl could definitely be optimized (and you've already
been
> shown one way to do so), your core issue is that you're doing several
> million passes over 32 million records.  That's not going to be fast
in
> any language.  (Even if you can check a million records per second,
> that's 32 seconds per pass, or about 6 hours for 1,000 passes, or just
> over a year for a million passes.)
[snip]

I was just thinking that as well. Does the OP have multiple boxes he can
run this on? This could easily break down into a parallel process either
by manual or programmatic assignment. Splitting up the parallel task is
pretty easy; Google even has a shell script for easy parallel processing
[1].

Of course there are a fair bit of If's in this. (If there are resources.
If the data can be split/shared easily. Ect Ect.)

If not, Dave's idea for a database is a good idea too.

~Stack~

[1] http://code.google.com/p/ppss/
Note: you will probably need to do a fair bit of tweaking for this but
the ideas are what will be most useful to you anyway.

Reply to:

References:
- Slow Script
  - From: "Gorka" <gorkalinux@yahoo.es>
- Re: Slow Script
  - From: Dave Sherohman <dave@sherohman.org>

Prev by Date: Re: Release Cycle
Next by Date: Xen problems in Lenny
Previous by thread: Re: Slow Script
Next by thread: Re: Slow Script
Index(es):
- Date
- Thread