Re: Bitmap scan is undercosted? - boolean correlation

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Vitaliy Garnashevich <vgarnashevich(at)gmail(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Bitmap scan is undercosted? - boolean correlation
Date: 2017-12-03 23:21:56
Message-ID: CAMkU=1xQs6=fB+W+=3suB0ZeGsNi8ekBqEqXQ8FriTPw5kcuVQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Sat, Dec 2, 2017 at 8:04 PM, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:

> On Sat, Dec 02, 2017 at 05:27:51PM -0800, Jeff Janes wrote:
> > I think the non-extended stats code also has trouble with booleans.
> > pg_stats gives me a correlation of 0.8 or higher for the flag column.
>
> It's not due to the boolean though; you see the same thing if you do:
> CREATE INDEX aaa_f ON aaa((flag::text));
> ANALYZE aaa;
> correlation | 0.81193
>
> or:
> ALTER TABLE aaa ADD flag2 int; UPDATE aaa SET flag2= flag::int
> correlation | 0.81193
>
> I think it's caused by having so few (2) values to correlate.
>
> most_common_vals | {f,t}
> most_common_freqs | {0.9014,0.0986}
> correlation | 0.822792
>
> It thinks there's somewhat-high correlation since it gets a list of x and y
> values (integer positions by logical and physical sort order) and 90% of
> the x
> list (logical value) are the same value ('t'), and the CTIDs are in order
> on
> the new index, so 90% of the values are 100% correlated.
>

But there is no index involved (except in the case of the functional
index). The correlation of table columns to physical order of the table
doesn't depend on the existence of an index, or the physical order within
an index.

But I do see that ties within the logical order of the column values are
broken to agree with the physical order. That is wrong, right? Is there
any argument that this is desirable?

It looks like it could be fixed with a few extra double calcs per distinct
value. Considering we already sorted the sample values using SQL-callable
collation dependent comparators, I doubt a few C-level double calcs is
going to be meaningful.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-12-03 23:31:40 Re: Bitmap scan is undercosted? - boolean correlation
Previous Message Ashutosh Bapat 2017-12-03 23:15:15 Re: [HACKERS] Constraint exclusion for partitioned tables

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2017-12-03 23:31:40 Re: Bitmap scan is undercosted? - boolean correlation
Previous Message Tom Lane 2017-12-03 23:08:46 Re: Bitmap scan is undercosted?