RE: Slow Script
> From: Dave Sherohman [mailto:dave@sherohman.org]
> Sent: Tuesday, February 03, 2009 11:25 AM
> Subject: Re: Slow Script
>
> On Tue, Feb 03, 2009 at 06:14:48PM +0100, Gorka wrote:
> > Hi! I've got a perl script with this for:
> >
> > for (my $j=0;$j<=$#fichero1;$j++)
> > {
> > if (@fichero1[$j] eq $valor1)
> > {
> > $token = 1;
> > }
> > }
> >
> > The problem is that fichero1 has 32 millions of records and moreover
> I've
> > got to repeat this for several millions times, so this way it would
take
> > years to finish.
> > Does anybody know a way to optimize this script? Is there any other
> linux
> > programing language I could make this more quickly whith?
> > Thank you!
>
> Although the Perl could definitely be optimized (and you've already
been
> shown one way to do so), your core issue is that you're doing several
> million passes over 32 million records. That's not going to be fast
in
> any language. (Even if you can check a million records per second,
> that's 32 seconds per pass, or about 6 hours for 1,000 passes, or just
> over a year for a million passes.)
[snip]
I was just thinking that as well. Does the OP have multiple boxes he can
run this on? This could easily break down into a parallel process either
by manual or programmatic assignment. Splitting up the parallel task is
pretty easy; Google even has a shell script for easy parallel processing
[1].
Of course there are a fair bit of If's in this. (If there are resources.
If the data can be split/shared easily. Ect Ect.)
If not, Dave's idea for a database is a good idea too.
~Stack~
[1] http://code.google.com/p/ppss/
Note: you will probably need to do a fair bit of tweaking for this but
the ideas are what will be most useful to you anyway.
Reply to: