Re: Bitmap scan is undercosted?

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Vitaliy Garnashevich <vgarnashevich(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Bitmap scan is undercosted?
Date: 2017-12-03 01:27:51
Message-ID: CAMkU=1yKK0M4W+bNc31PZhH15dzPakSmpr6TH9exT8OUZvmg_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Sat, Dec 2, 2017 at 3:44 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> > On Fri, Dec 1, 2017 at 11:08 PM, Vitaliy Garnashevich <
> > vgarnashevich(at)gmail(dot)com> wrote:
> >> # x4 tuple/operator costs - bitmap scan still a bit cheaper
> >> set seq_page_cost = 1.0;
> >> set random_page_cost = 1.0;
> >> set cpu_tuple_cost = 0.04;
> >> set cpu_index_tuple_cost = 0.02;
> >> set cpu_operator_cost = 0.01;
>
> > If you really want to target the plan with the BitmapAnd, you should
> > increase cpu_index_tuple_cost and/or cpu_operator_cost but not increase
> > cpu_tuple_cost. That is because the unselective bitmap index scan does
> > not incur any cpu_tuple_cost, but does incur index_tuple and operator
> > costs. Unfortunately all other index scans in the system will also be
> > skewed by such a change if you make the change system-wide.
>
> I think it'd be a serious error to screw around with your cost settings
> on the basis of a single case in which the rowcount estimates are so
> far off. It's really those estimates that are the problem AFAICS.
>
> The core issue in this example is that, the way the test data is set up,
> the "flag = true" condition actually adds no selectivity at all, because
> every row with "num = 1" is certain to have "flag = true". If the planner
> realized that, it would certainly not bother with BitmapAnd'ing the flag
> index onto the results of the num index. But it doesn't know that those
> columns are correlated, so it supposes that adding the extra index will
> give a 10x reduction in the number of heap rows that have to be visited
> (since it knows that only 1/10th of the rows have "flag = true").
> *That* is what causes the overly optimistic cost estimate for the
> two-index bitmapscan, and no amount of fiddling with the cost parameters
> will make that better.
>

But he also tested with num=2 and num=39, which reverses the situation so
the bitmap is 100% selective rather than the 90% the planner thinks it will
be.

But it is still slower for him (I am having trouble replicating that exact
behavior), so building the bitmap to rule out 100% of the rows is
empirically not worth it, I don't see how building it to rule out 90%, as
the planner things, would be any better.

> I tried creating multiple-column statistics using the v10 facility for
> that:
>
> regression=# create statistics s1 on num, flag from aaa;
> CREATE STATISTICS
> regression=# analyze aaa;
> ANALYZE
>
> but that changed the estimate not at all, which surprised me because
> dependency statistics are supposed to fix exactly this type of problem.
> I suspect there may be something in the extended-stats code that causes it
> not to work right for boolean columns --- this wouldn't be excessively
> surprising because of the way the planner messes around with converting
> "flag = true" to just "flag" and sometimes back again. But I've not
> looked closer yet.
>

I think the non-extended stats code also has trouble with booleans.
pg_stats gives me a correlation of 0.8 or higher for the flag column.

Due to that, when I disable bitmapscans and seqscans, I start getting slow
index scans on the wrong index, i2 rather than i1. I don't know why he
doesn't see that in his example.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Oleg Bartunov 2017-12-03 02:52:13 Re: [HACKERS] Custom compression methods
Previous Message Tom Lane 2017-12-02 23:44:50 Re: Bitmap scan is undercosted?

Browse pgsql-performance by date

  From Date Subject
Next Message Justin Pryzby 2017-12-03 04:04:30 Re: Bitmap scan is undercosted? - boolean correlation
Previous Message Tom Lane 2017-12-02 23:44:50 Re: Bitmap scan is undercosted?