Re: Database performance
On Mon, May 17, 2004 at 12:36:24AM -0700, Karsten M. Self wrote:
> This is really old, but it's straight up my alley, so...
> on Sat, Apr 03, 2004 at 07:08:39PM -0600, Christopher L. Everett (firstname.lastname@example.org) wrote:
> > I do a lot of database work. Sometimes I must do massive batch jobs on
> In general:
> - Load a small search/criteria set into memory, and use it to
> sequentially scan a larger dataset.
> - Lose any data you don't need early on.
> - When querying remote data sources, if possible, *run the query*
> remotely, and just return the result set. This was the trick with
> my 20 hours -> 5 minutes process. I defined a view on the remote
> database, populated a small (~20k rows) table on the database
> server, and queried the view for my result set (returning ~20k
> records). Querying against a 40m row table, indexed.
> - Avoid disk processing by streaming / piping data between processes.
> - Use hashes rather than sorts or b-trees (or get your tools to use
> them for you).
> - Think about what you're doing.
> - Do as little as possible. That's been by gag answer to "what do you
> do", but from an optimization standpoint, it's the goal.
> It's both science and art. Treat it that way.
Seriously Karsten, have you ever considered writing a book of monographs
and epigrams? You could title it _Karsten Recommends: Because Karsten
Knows Better than You_. That way, instead of compulsively saving even the
e-mails that have nothing to do with my current situation, I could simply
have them all in a lovely bound volume.
Suddenly I feel the need to do heavy database work, just so I can put all
my new knowledge to use.