Re: Slight change in query leads to unexpected change in query plan

From: Jack Orenstein <jack(dot)orenstein(at)hds(dot)com>
To: Sam Mason <sam(at)samason(dot)me(dot)uk>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Slight change in query leads to unexpected change in query plan
Date: 2009-06-23 13:54:32
Message-ID: 4A40DE98.2040902@hds.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Sam Mason wrote:
> On Mon, Jun 22, 2009 at 05:55:28PM -0400, Jack Orenstein wrote:
>> ris-# select *
>> ris-# from T
>> ris-# where pk > 1000000000
>> ris-# and value = 'asdf'::bytea
>> ris-# order by pk
>> ris-# limit 100;
>
> PG thinks that you're going to get 16 rows back matching those
> conditions, bitmap heap scans are faster in some cases and this is
> likely to be one of those cases so PG is optimizing things correctly.

But that's 16 rows at the end. I'm concerned about the earlier processing.
Here's the plan:

Limit (cost=78352.20..78352.24 rows=16 width=451)
-> Sort (cost=78352.20..78352.24 rows=16 width=451)
Sort Key: pk
-> Bitmap Heap Scan on t (cost=2091.60..78351.88 rows=16 width=451)
Recheck Cond: (pk > 1000000000)
Filter: (value = 'asdf'::bytea)
-> Bitmap Index Scan on t_pkey (cost=0.00..2091.60 rows=91088
width=0)
Index Cond: (pk > 1000000000)

If I'm reading this right, the optimizer estimates that the bitmap index scan
will identify 91088 rows, which all have to be retrieved from the heap for
evaluation of the restriction on value. The row width is 451, so even if the
rows are packed into 8k pages perfectly, that's ~5000 pages that have to be read.

If there is an index scan (non-bitmap) with the value restriction checked at the
end, we know the optimizer thinks that is cheaper. Here is the plan without the
value restriction:

Limit (cost=0.00..324.99 rows=100 width=451)
-> Index Scan using t_pkey on t (cost=0.00..296027.98 rows=91088 width=451)
Index Cond: (pk > 1000000000)

Adding the value restriction at the top of this query plan wouldn't increase the
cost very much.

So given that estimate of 91088 rows coming out of the bitmap index scan, I
still don't understand the optimizer's reasoning. Am I reading the plan incorrectly?

Jack

>
>> Limit (cost=78352.20..78352.24 rows=16 width=451)
>
>> ris-# select *
>> ris-# from T
>> ris-# where pk > 1000000000
>> ris-# order by pk
>> ris-# limit 100;
>
> With this query, PG thinks that you may get 91088 rows back but because
> you've got a LIMIT in there you only needs the first 100 of them. It
> will therefore prefer a plan that will stop short and thus is preferring
> an index scan.
>
>> Limit (cost=0.00..324.99 rows=100 width=451)
>> -> Index Scan using t_pkey on t (cost=0.00..296027.98 rows=91088 width=451)
>
>
>> Why does adding the value restriction so radically change the execution
>> plan?
>
> PG doesn't have any cross column statistics and hence it assumes that pk
> and value are uncorrelated. You may get better results with increasing
> the statistics target[1] for those columns as that will give PG more
> information, but if the columns are indeed correlated then that's not
> going to help.
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2009-06-23 14:04:32 Re: Explaining functions.
Previous Message Hartman, Matthew 2009-06-23 13:48:14 Re: Explaining functions.