Re: BUG #17545: Incorrect selectivity for IS NOT DISTINCT FROM and NULLs

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: rpsalmi(at)gmail(dot)com, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17545: Incorrect selectivity for IS NOT DISTINCT FROM and NULLs
Date: 2022-07-12 04:35:48
Message-ID: CAMbWs4-qTRsTeNyHz0ouKkuSs8ct=rxO26pL1CNG1aGk2znvow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Jul 11, 2022 at 10:36 PM PG Bug reporting form <
noreply(at)postgresql(dot)org> wrote:

> Hi,
>
> It appears that selectivity for "IS NOT DISTINCT FROM" clauses regard the
> proportion of NULLs in the same way as "=", leading to polar opposite row
> count
> estimates.
>
> In practice this causes nested loop joins to be used when that is not
> appropriate. Not sure if this is a known issue, but I was unable to find
> any
> previous mentions of it.

This can also happen to 'IS DISTINCT FROM'.

# EXPLAIN SELECT FROM test x, test y WHERE x.a IS DISTINCT FROM y.a;
QUERY PLAN
----------------------------------------------------------------------
Nested Loop (cost=0.00..15030.50 rows=1000000 width=0)
Join Filter: (x.a IS DISTINCT FROM y.a)
-> Seq Scan on test x (cost=0.00..14.00 rows=1000 width=4)
-> Materialize (cost=0.00..19.00 rows=1000 width=4)
-> Seq Scan on test y (cost=0.00..14.00 rows=1000 width=4)
(5 rows)

DistinctExpr has the same representation as OpExpr with operator "=". So
we estimate its selectivity just as if it is 'x.a = y.a', and then
invert the result.

For OpExpr 'x.a = y.a', because no MCV lists are available, we estimate
its selectivity as MIN(1/nd1,1/nd2)*(1-nullfrac1)*(1-nullfrac2). Note
that both nullfrac1 and nullfrac2 are 100% in the case, so the
selectivity is calculated as zero, and the inverted result is 1, which
is for DistinctExpr.

The comment in the codes has explained this behavior, as it says:

/*
* DistinctExpr has the same representation as OpExpr, but the
* contained operator is "=" not "<>", so we must negate the result.
* This estimation method doesn't give the right behavior for nulls,
* but it's better than doing nothing.
*/
if (IsA(clause, DistinctExpr))
s1 = 1.0 - s1;

So maybe this is a known issue?

Thanks
Richard

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2022-07-12 04:41:46 Re: BUG #17545: Incorrect selectivity for IS NOT DISTINCT FROM and NULLs
Previous Message Tom Lane 2022-07-11 18:56:43 Re: BUG #17546: power() function - value is distorted via automatic type cast