From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Ed Loehr <pggeneral(at)bluepolka(dot)net> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Questions on 7.2.1 query plan choices |
Date: | 2002-04-18 05:49:05 |
Message-ID: | 7082.1019108945@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Ed Loehr <pggeneral(at)bluepolka(dot)net> writes:
> $ psql -c "explain select key, value from freetext where tobeindexed = 't'
> and isindexed = 'f'
> NOTICE: QUERY PLAN:
> Seq Scan on freetext (cost=0.00..102114.21 rows=296161 width=1138)
> $ psql -c "select count(key) from freetext"
> count
> --------
> 728868
> (1 row)
> $ psql -c "select count(key) from freetext where tobeindexed = 't' and
> isindexed = 'f'"
> count
> -------
> 1319
> (1 row)
The problem here is that the planner is estimating 296161 rows retrieved
instead of 1319. If it were right, then a seqscan would be the right
choice. My guess is that there is a strong correlation between the
tobeindexed and isindexed columns --- but the current statistical model
has no clue about cross-column correlations, so you get an estimate
that's just based on the product of the frequencies independently.
Curt Sampson's nearby remarks about partial indexes are not a bad
suggestion. An even more direct attack is to combine these two
columns into a single column with four states (you could use smallint
or "char"-with-the-quotes). The 7.2 planner *would* have a pretty
good idea about the relative frequencies of the different states, and
would make the right seqscan-vs-indexscan choice depending on which
state you were scanning for.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Curt Sampson | 2002-04-18 06:52:09 | Re: Questions on 7.2.1 query plan choices |
Previous Message | Curt Sampson | 2002-04-18 05:30:04 | Re: Questions on 7.2.1 query plan choices |