Why does the query planner use two full indexes, when a dedicated partial index exists?

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Richard Neill <rn214(at)richardneill(dot)org>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Why does the query planner use two full indexes, when a dedicated partial index exists?
Date: 2012-12-27 16:17:16
Message-ID: CAMkU=1wGV4KZnDbcqdELyaxE4sMvza05KurjA0VH7m9cTn5J_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thursday, December 20, 2012, Jeff Janes wrote:

> On Thursday, December 20, 2012, Richard Neill wrote:
>
>>
>>
>> -> Bitmap Index Scan on tbl_tracker_exit_state_idx
>> (cost=0.00..8.36 rows=151 width=0) (actual time=7.946..7.946 rows=20277
>> loops=1)
>>
>
> This is finding 100 times more rows than it thinks it will. If that could
> be fixed, surely this plan would not look as good. But then, it would
> probably just switch to another plan that is not the one you want, either.
>

I guess the issue here is that the histogram postgres uses to estimate the
number of rows that will be found is based on visible rows, and it is
correctly estimating the number of visible rows that will be found. And
that is the relevant thing to pass up to a higher join for its estimation.
But for estimating the number of blocks a given index scan will access,
the right thing would be the number of tuples visited, not the number of
them found to be visible. So that is where this plan goes systematically
wrong.

I guess the correct thing would be for postgres to keep two histograms, one
of all tuples and one of all visible tuples, and to produce different
selectivity estimates for different purposes. But I don't see that change
getting made. It is only meaningful in cases where there is a fundamental
skew in distribution between visible tuples and
invisible-but-as-yet-unvacuumed tuples.

I think that that fundamental skew is the source of both the
underestimation of the bitmap scan cost, and overestimation of the partial
index scan (although I can't get it to overestimate that be anywhere near
the amount you were seeing).

I still think your best bet is to get rid of the partial index and trade
the full one on (parcel_id_code) for one on (parcel_id_code,exit_state). I
think that will be much less fragile than reindexing in a cron job.

Cheers,

Jeff

>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Richard Neill 2012-12-27 16:43:38 Re: Why does the query planner use two full indexes, when a dedicated partial index exists?
Previous Message Charles Gomes 2012-12-27 16:16:26 Re: Performance on Bulk Insert to Partitioned Table