[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

RE: Slow Script



> From: Dave Sherohman [mailto:dave@sherohman.org]
> Sent: Tuesday, February 03, 2009 11:25 AM
> Subject: Re: Slow Script
> 
> On Tue, Feb 03, 2009 at 06:14:48PM +0100, Gorka wrote:
> > Hi! I've got a perl script with this for:
> >
> >   for (my $j=0;$j<=$#fichero1;$j++)
> >   {
> >     if (@fichero1[$j] eq $valor1)
> >     {
> >       $token = 1;
> >     }
> >   }
> >
> > The problem is that fichero1 has 32 millions of records and moreover
> I've
> > got to repeat this for several millions times, so this way it would
take
> > years to finish.
> > Does anybody know a way to optimize this script? Is there any other
> linux
> > programing language I could make this more quickly whith?
> > Thank you!
> 
> Although the Perl could definitely be optimized (and you've already
been
> shown one way to do so), your core issue is that you're doing several
> million passes over 32 million records.  That's not going to be fast
in
> any language.  (Even if you can check a million records per second,
> that's 32 seconds per pass, or about 6 hours for 1,000 passes, or just
> over a year for a million passes.)
[snip]

I was just thinking that as well. Does the OP have multiple boxes he can
run this on? This could easily break down into a parallel process either
by manual or programmatic assignment. Splitting up the parallel task is
pretty easy; Google even has a shell script for easy parallel processing
[1].

Of course there are a fair bit of If's in this. (If there are resources.
If the data can be split/shared easily. Ect Ect.)

If not, Dave's idea for a database is a good idea too.


~Stack~

[1] http://code.google.com/p/ppss/
Note: you will probably need to do a fair bit of tweaking for this but
the ideas are what will be most useful to you anyway.


Reply to: