Re: Large table search question

From: Greg Stark <gsstark(at)mit(dot)edu>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Large table search question
Date: 2004-06-01 16:22:14
Message-ID: 874qpv8czt.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Richard Huxton <dev(at)archonet(dot)com> writes:

> If you execute a lot of queries for last_name="Smith" AND first_name="John"
> then it might well help, there are a lot of "Smith"s to choose from.

I think this is the crux of the argument. Even for a common last name like
Smith, you're going to be talking about what, a few thousand records? Probably
selective enough that the extra column isn't really necessary and you pay the
cost for that extra column on every single update and even on other lookups.

On the other hand this logic applies best to DSS style databases where you're
looking for the fastest average throughput. For OLTP databases it may not
hold: if 'Smith' breaks your performance guarantee then the application could
break. For systems like that it may be worth paying a 1% penalty everywhere
that's within your time budget to avoid paying a 100% penalty on the rare
query that would cause failures, even if on average that means a performance
loss.

In practice I find two column indexes are not uncommon, especially on
many-to-many relationship tables. Three column indexes are rare but they do
happen. Offhand I don't ever recall defining any indexes with four or more
columns.

> There's really no alternative to testing. The statistics tables are very useful
> here. Unless you have good reason not to, always turn the statistics gathering
> on, and take snapshot regularly to keep an eye on where PG is exerting the most
> effort.

IMHO many people rely too heavily on empirical data rather than having a good
sense of what they want to be happening. Statistics can obscure situations
like what I described above.

I do have statistics on though, and have the same thinking about always
leaving it on, but I'm unclear how to make use of these data. What tools
should I be looking at to use them?

--
greg

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Greg Stark 2004-06-01 16:30:15 Problems with pgsql mail servers?
Previous Message zhicheng wang 2004-06-01 16:19:35 Re: after using pg_resetxlog, db lost