Re: Slow Script
On Wed, Feb 04, 2009 at 06:17:43AM EST, Dave Sherohman wrote:
> On Tue, Feb 03, 2009 at 09:02:52PM -0500, Chris Jones wrote:
> > More seriouly, when you are dealing with 32 million records, one major
> > venue for optimization is to keep disk access to a minimum. Disk access
> > IIRC is measured in milliseconds, RAM access in nanoseconds and above..
> > Do the math..
> Given that the posted loop is operating entirely on Perl in-memory
> arrays, the OP is unlikely to be deliberately accessing the disk
> during this process.
>  If it's a tied array, then it could have some magical disk
> interaction behind it, but the OP doesn't appear to have reached a state
> of Perl Enlightenment which would allow him to create or optimize magic
> that deep. The other possibility for disk access would be if the
> dataset is larger than available RAM..
Ay, there's the rub.
> ..and it's getting paged in and out from disk, which is just bad news
> for performance no matter how you slice it.
The worse possible scenario (as far as I understand it :-) because
you'll still have the I/O on the file _plus_ the I/O on your swap
partition/datasets _plus_ high system CPU usage due to the paging.
And the worst thing about this is that it is unpredictable.. there will
be times when it won't happen because enough memory is available.. and
other times when you get paged a four in the morning.
> Aside from those two cases, it looks very unlikely that I/O would be
> the bottleneck here.
Trust me. Whatever the machine or OS, when you're dealing with such
volumes, I/O alway ends up being part of the equation.
I don't know Perl. Thanks for the "tied array" hint.