Re: Questions on 7.2.1 query plan choices

From: Curt Sampson <cjs(at)cynic(dot)net>
To: Ed Loehr <pggeneral(at)bluepolka(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Questions on 7.2.1 query plan choices
Date: 2002-04-18 05:30:04
Message-ID: Pine.NEB.4.43.0204181406300.457-100000@angelic.cynic.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, 17 Apr 2002, Ed Loehr wrote:

> 1) There are over 700,000 rows in the table below, but only about 1,300
> matching the where clause. How can I (and should I) get the planner to
> choose to traverse indexed_idx instead of a sequential scan? The following
> is immediately after calling 'analyze'...
>
> $ psql -c "explain select key, value from freetext where tobeindexed = 't'
> and isindexed = 'f'
> NOTICE: QUERY PLAN:
>
> Seq Scan on freetext (cost=0.00..102114.21 rows=296161 width=1138)
...

> 2) Why does the planner choose to first scan freetext_pkey when choosing
> indexed_idx would narrow the 700K rows down to 1300 in the query below?

Well, I don't know postgres so well, but here's my guess:

Since tobeindexed and isindexed are boolean values, there are only four
possible values for the index. This means that the selectivity of this
index is going to be really, really low.

Now that's all right if the query optimizer has some idea that, for this
particular set of values, only a few thousand tuples will be retrieved.
But it has to know this in advance somehow, because doing the queries
above as an index lookup rather than a table scan or primary key index
scan would be totally stupid if 100,000 rows were going to be returned.

Now a query optimizer, lacking the right statistics on key distribution,
would probably guess that on average you're going to select 1/4 of the
rows [175,000 in this case] for any index value. (That would be my
personal guess, if you hadn't told me that the key distribution was
different.)

An analyze might help, though I don't know if postgres collects the
right statistics to fix this problem.

But since you know yourself that you're going to have a relatively small
number of rows where tobeindexed = true, you're in a perfect position to
use one of the coolest features of postgres: a partial index. Create the
index for only those rows where tobeindexed = t (see the docs for syntax
details), and you end up with a 1300 row index rather than a 700,000 row
index for a start. With any luck, the postgres query optimizer will also
realize that there is no way something using that index can ever come up
with more than the number of rows in the index itself, which at 1300 is
very selective, and it will then use it first.

Of course, I hope someone will correct me if I'm wrong here. At any
rate, please post and let us know how it turns out. Partial seem to me
to be designed exactly for situations like the one above.

cjs
--
Curt Sampson <cjs(at)cynic(dot)net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2002-04-18 05:49:05 Re: Questions on 7.2.1 query plan choices
Previous Message Tom Lane 2002-04-18 05:29:30 Re: Doc error? was: Why does this not work? (keyword 'TEXT')