Re: Bitmap scan is undercosted?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Vitaliy Garnashevich <vgarnashevich(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Bitmap scan is undercosted?
Date: 2017-12-02 23:44:50
Message-ID: 18188.1512258290@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> On Fri, Dec 1, 2017 at 11:08 PM, Vitaliy Garnashevich <
> vgarnashevich(at)gmail(dot)com> wrote:
>> # x4 tuple/operator costs - bitmap scan still a bit cheaper
>> set seq_page_cost = 1.0;
>> set random_page_cost = 1.0;
>> set cpu_tuple_cost = 0.04;
>> set cpu_index_tuple_cost = 0.02;
>> set cpu_operator_cost = 0.01;

> If you really want to target the plan with the BitmapAnd, you should
> increase cpu_index_tuple_cost and/or cpu_operator_cost but not increase
> cpu_tuple_cost. That is because the unselective bitmap index scan does
> not incur any cpu_tuple_cost, but does incur index_tuple and operator
> costs. Unfortunately all other index scans in the system will also be
> skewed by such a change if you make the change system-wide.

I think it'd be a serious error to screw around with your cost settings
on the basis of a single case in which the rowcount estimates are so
far off. It's really those estimates that are the problem AFAICS.

The core issue in this example is that, the way the test data is set up,
the "flag = true" condition actually adds no selectivity at all, because
every row with "num = 1" is certain to have "flag = true". If the planner
realized that, it would certainly not bother with BitmapAnd'ing the flag
index onto the results of the num index. But it doesn't know that those
columns are correlated, so it supposes that adding the extra index will
give a 10x reduction in the number of heap rows that have to be visited
(since it knows that only 1/10th of the rows have "flag = true").
*That* is what causes the overly optimistic cost estimate for the
two-index bitmapscan, and no amount of fiddling with the cost parameters
will make that better.

I tried creating multiple-column statistics using the v10 facility for
that:

regression=# create statistics s1 on num, flag from aaa;
CREATE STATISTICS
regression=# analyze aaa;
ANALYZE

but that changed the estimate not at all, which surprised me because
dependency statistics are supposed to fix exactly this type of problem.
I suspect there may be something in the extended-stats code that causes it
not to work right for boolean columns --- this wouldn't be excessively
surprising because of the way the planner messes around with converting
"flag = true" to just "flag" and sometimes back again. But I've not
looked closer yet.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2017-12-03 01:27:51 Re: Bitmap scan is undercosted?
Previous Message Tomas Vondra 2017-12-02 21:53:10 Re: [HACKERS] Custom compression methods

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2017-12-03 01:27:51 Re: Bitmap scan is undercosted?
Previous Message Jeff Janes 2017-12-02 21:17:47 Re: Bitmap scan is undercosted?