[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Database performance



On Mon, May 17, 2004 at 12:36:24AM -0700, Karsten M. Self wrote:
> This is really old, but it's straight up my alley, so...
> 
> on Sat, Apr 03, 2004 at 07:08:39PM -0600, Christopher L. Everett (ceverett@ceverett.com) wrote:
> > I do a lot of database work.  Sometimes I must do massive batch jobs on 

> In general:
> 
>   - Load a small search/criteria set into memory, and use it to
>     sequentially scan a larger dataset.
> 
>   - Lose any data you don't need early on.
> 
>   - When querying remote data sources, if possible, *run the query*
>     remotely, and just return the result set.  This was the trick with
>     my 20 hours -> 5 minutes process.  I defined a view on the remote
>     database, populated a small (~20k rows) table on the database
>     server, and queried the view for my result set (returning ~20k
>     records).  Querying against a 40m row table, indexed.
> 
>   - Avoid disk processing by streaming / piping data between processes.
> 
>   - Use hashes rather than sorts or b-trees (or get your tools to use
>     them for you).
> 
>   - Think about what you're doing.
> 
>   - Do as little as possible.  That's been by gag answer to "what do you
>     do", but from an optimization standpoint, it's the goal.
> 
> It's both science and art.  Treat it that way.

Seriously Karsten, have you ever considered writing a book of monographs 
and epigrams? You could title it _Karsten Recommends: Because Karsten 
Knows Better than You_. That way, instead of compulsively saving even the 
e-mails that have nothing to do with my current situation, I could simply 
have them all in a lovely bound volume. 

Suddenly I feel the need to do heavy database work, just so I can put all 
my new knowledge to use. 

Cheers, 
Jason Whittle



Reply to: