Quick Links

Re: Query performance with disabled hashjoin and mergejoin

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Ivan Voras <ivoras(at)freebsd(dot)org>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Query performance with disabled hashjoin and mergejoin
Date:	2011-03-08 21:56:51
Message-ID:	AANLkTi=0PCO2DHVS2Qy0xr93s3AhiHjL3URavFvu+diN@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Tue, Mar 8, 2011 at 2:57 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Mar 7, 2011 at 3:40 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> On Tue, Feb 22, 2011 at 9:07 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Fri, Feb 4, 2011 at 7:08 AM, Ivan Voras <ivoras(at)freebsd(dot)org> wrote:
>>>> -> BitmapAnd (cost=1282.94..1282.94
>>>> rows=1430 width=0) (actual time=5.508..5.508 rows=0 loops=1)
>>>> -> Bitmap Index Scan on
>>>> news_index_layout_id_state (cost=0.00..150.14 rows=2587 width=0) (actual
>>>> time=0.909..0.909 rows=3464 loops=1)
>>>> Index Cond: ((layout_id = 8980)
>>>> AND (state = 2))
>>>> -> BitmapOr (cost=1132.20..1132.20
>>>> rows=20127 width=0) (actual time=4.136..4.136 rows=0 loops=1)
>>>> -> Bitmap Index Scan on
>>>> news_visible_from (cost=0.00..1122.09 rows=19976 width=0) (actual
>>>> time=3.367..3.367 rows=19932 loops=1)
>>>> Index Cond: (visible_from
>>>> IS NULL)
>>>> -> Bitmap Index Scan on
>>>> news_visible_to (cost=0.00..9.40 rows=151 width=0) (actual
>>>> time=0.766..0.766 rows=43 loops=1)
>>>> Index Cond: (1296806570 <=
>>>> visible_to)
>>>
>>> I think this part of the query is the problem. Since the planner
>>> doesn't support cross-column statistics, it can't spot the correlation
>>> between these different search conditions, resulting in a badly broken
>>> selectivity estimate.
>>>
>>> Sometimes you can work around this by adding a single column, computed
>>> with a trigger, that contains enough information to test the whole
>>> WHERE-clause condition using a single indexable test against the
>>> column value. Or sometimes you can get around it by partitioning the
>>> data into multiple tables, say with the visible_from IS NULL rows in a
>>> different table from the rest.
>>
>> Why should you need cross column statistics for this case? You should
>> be able to multiple selectivity from left to right as long as you are
>> doing equality comparisons, yes?
>>
>> Right now the planner is treating
>> select * from foo where (a,b,c) between (1,1,1) and (9,9,9) the same
>> (using selectivity on a) as
>> select * from foo where (a,b,c) between (1,1,5) and (1,1,7)
>>
>> but they are not the same. since in the second query terms a,b are
>> equal, shouldn't you able to multiply the selectivity through?
>
> I'm not quite following that...
>
> The reason I thought cross-column correlations might be relevant is
> that the bitmap index scan on news_visible_from is quite accurate
> (19976 estimated vs. 19932 actual) and the bitmap index scan on
> news_visible_to is tolerably accurate (151 estimated vs. 41 actual)
> but the estimate on the BitmapOr is somehow totally wrong (20127
> estimated vs. 0 actual). But on further reflection that doesn't make
> much sense. How can the BitmapOr produce fewer rows than the sum of
> its constituent inputs?
>
> /me scratches head.

my fault -- the point i was making I think was valid but didn't apply
to the op's question: I mistakenly where expression could be converted
to row wise comparison type operation but that wasn't the case...

merlin

In response to

Re: Query performance with disabled hashjoin and mergejoin at 2011-03-08 20:57:46 from Robert Haas

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Robert Haas	2011-03-09 16:48:41	Re: Query performance with disabled hashjoin and mergejoin
Previous Message	Tom Lane	2011-03-08 21:24:25	Re: Query performance with disabled hashjoin and mergejoin