Quick Links

Why does the query planner use two full indexes, when a dedicated partial index exists?

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Richard Neill <rn214(at)richardneill(dot)org>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Why does the query planner use two full indexes, when a dedicated partial index exists?
Date:	2012-12-27 16:17:16
Message-ID:	CAMkU=1wGV4KZnDbcqdELyaxE4sMvza05KurjA0VH7m9cTn5J_A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Thursday, December 20, 2012, Jeff Janes wrote:

> On Thursday, December 20, 2012, Richard Neill wrote:
>
>>
>>
>> -> Bitmap Index Scan on tbl_tracker_exit_state_idx
>> (cost=0.00..8.36 rows=151 width=0) (actual time=7.946..7.946 rows=20277
>> loops=1)
>>
>
> This is finding 100 times more rows than it thinks it will. If that could
> be fixed, surely this plan would not look as good. But then, it would
> probably just switch to another plan that is not the one you want, either.
>

I guess the issue here is that the histogram postgres uses to estimate the
number of rows that will be found is based on visible rows, and it is
correctly estimating the number of visible rows that will be found. And
that is the relevant thing to pass up to a higher join for its estimation.
But for estimating the number of blocks a given index scan will access,
the right thing would be the number of tuples visited, not the number of
them found to be visible. So that is where this plan goes systematically
wrong.

I guess the correct thing would be for postgres to keep two histograms, one
of all tuples and one of all visible tuples, and to produce different
selectivity estimates for different purposes. But I don't see that change
getting made. It is only meaningful in cases where there is a fundamental
skew in distribution between visible tuples and
invisible-but-as-yet-unvacuumed tuples.

I think that that fundamental skew is the source of both the
underestimation of the bitmap scan cost, and overestimation of the partial
index scan (although I can't get it to overestimate that be anywhere near
the amount you were seeing).

I still think your best bet is to get rid of the partial index and trade
the full one on (parcel_id_code) for one on (parcel_id_code,exit_state). I
think that will be much less fragile than reindexing in a cron job.

Cheers,

Jeff

In response to

Re: Why does the query planner use two full indexes, when a dedicated partial index exists? at 2012-12-21 05:15:16 from Jeff Janes

Responses

Re: Why does the query planner use two full indexes, when a dedicated partial index exists? at 2012-12-27 16:43:38 from Richard Neill

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Richard Neill	2012-12-27 16:43:38	Re: Why does the query planner use two full indexes, when a dedicated partial index exists?
Previous Message	Charles Gomes	2012-12-27 16:16:26	Re: Performance on Bulk Insert to Partitioned Table