Re: Slow query - index not used

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: michael(at)synchronicity(dot)com
Cc: Pgsql-Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Slow query - index not used
Date: 2004-01-21 16:40:27
Message-ID: 1425.1074703227@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Michael Brusser <michael(at)synchronicity(dot)com> writes:
> So 'rows' values are incorrect.

You sound like you are expecting them to be exact. They're just
estimates. They're all plenty close enough for planning purposes,
except maybe the one for 'KnowledgeBase' is a little further off
than I would have expected. That could just be a chance sampling
thing --- if you rerun ANALYZE and then check again, how much does that
estimate move around in a few tries?

> I mean 'rows' has value of about 5000 records from the total of 75000
> records on the table. This ratio does not seem high enough to assume
> that index scan won't be benefitial.

You're mistaken. You didn't give any indication of how wide the rows
are in this table, but supposing for purposes of argument that they're
roughly 100 bytes apiece, there would be room for 80 rows on each 8K
page of the table. A scan that needs to visit 1/15th of the table is
statistically certain to read nearly every page of the table, many of
them multiple times. (This assumes the needed rows are randomly
scattered, which wouldn't be true if the table is "clustered" on the
index, but you didn't say anything about having done that.) So an
indexscan doesn't save any I/O, and may cost I/O if some of the pages
fall out of buffer cache between visits. Add to that the penalty for
random instead of sequential reads, and the cost to read the index
itself, and you come out way behind a seqscan.

This all assumes that I/O is far more expensive than CPU cycles.
If you have a database that's small enough to fit in RAM then the cost
of checking the WHERE clauses at each tuple could become the dominant
factor. If that's your situation you should look at altering the
planner's cost parameters --- in particular lower random_page_cost and
increase the various CPU cost settings.

With the default cost settings, the planner will not pick an indexscan
if it thinks that more than a couple percent of the table needs to be
visited. (I think the breakeven point is dependent on table size, but
don't have the details in my head right now.) You can experiment with
EXPLAIN ANALYZE and the various enable_xxx settings to see if it's right
or not in your environment.

See the pgsql-performance archives for more discussion.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bort, Paul 2004-01-21 16:59:46 Re: Allow backend to output result sets in XML
Previous Message Joshua D. Drake 2004-01-21 16:10:01 Re: Allow backend to output result sets in XML