BUG #17545: Incorrect selectivity for IS NOT DISTINCT FROM and NULLs

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: rpsalmi(at)gmail(dot)com
Subject: BUG #17545: Incorrect selectivity for IS NOT DISTINCT FROM and NULLs
Date: 2022-07-11 14:18:44
Message-ID: 17545-a0ca4de888953169@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 17545
Logged by: Roope Salmi
Email address: rpsalmi(at)gmail(dot)com
PostgreSQL version: 14.4
Operating system: Ubuntu 22.04
Description:

Hi,

It appears that selectivity for "IS NOT DISTINCT FROM" clauses regard the
proportion of NULLs in the same way as "=", leading to polar opposite row
count
estimates.

In practice this causes nested loop joins to be used when that is not
appropriate. Not sure if this is a known issue, but I was unable to find
any
previous mentions of it.

Here I create a table with one column and 1000 rows of NULL. A cartesian
product
with "WHERE x.a = y.a" correctly estimates that there are around zero
matching
rows. "x.a IS NOT DISTINCT FROM y.a" incorrectly gives the same estimate,
whereas "x.a = y.a OR (x.a IS NULL AND y.a IS NULL)", which should be
equivalent, gives the correct 1000000.

postgres=# CREATE TABLE test(a INTEGER);
CREATE TABLE
postgres=# INSERT INTO test(a) SELECT NULL FROM generate_series(1, 1000);
INSERT 0 1000
postgres=# ANALYZE test;
ANALYZE
postgres=# EXPLAIN SELECT FROM test x, test y WHERE x.a = y.a;
QUERY PLAN
----------------------------------------------------------------------
Merge Join (cost=127.66..137.67 rows=1 width=0)
Merge Cond: (x.a = y.a)
-> Sort (cost=63.83..66.33 rows=1000 width=4)
Sort Key: x.a
-> Seq Scan on test x (cost=0.00..14.00 rows=1000 width=4)
-> Sort (cost=63.83..66.33 rows=1000 width=4)
Sort Key: y.a
-> Seq Scan on test y (cost=0.00..14.00 rows=1000 width=4)
(8 rows)

postgres=# EXPLAIN SELECT FROM test x, test y
postgres=# WHERE x.a IS NOT DISTINCT FROM y.a;
QUERY PLAN
----------------------------------------------------------------------
Nested Loop (cost=0.00..15030.50 rows=1 width=0)
Join Filter: (NOT (x.a IS DISTINCT FROM y.a))
-> Seq Scan on test x (cost=0.00..14.00 rows=1000 width=4)
-> Materialize (cost=0.00..19.00 rows=1000 width=4)
-> Seq Scan on test y (cost=0.00..14.00 rows=1000 width=4)
(5 rows)

postgres=# EXPLAIN SELECT FROM test x, test y

postgres-# WHERE x.a = y.a OR (x.a IS NULL AND y.a IS NULL);
QUERY PLAN
----------------------------------------------------------------------
Nested Loop (cost=0.00..15030.50 rows=1000000 width=0)
Join Filter: ((x.a = y.a) OR ((x.a IS NULL) AND (y.a IS NULL)))
-> Seq Scan on test x (cost=0.00..14.00 rows=1000 width=4)
-> Materialize (cost=0.00..19.00 rows=1000 width=4)
-> Seq Scan on test y (cost=0.00..14.00 rows=1000 width=4)
(5 rows)

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2022-07-11 15:20:21 BUG #17546: power() function - value is distorted via automatic type cast
Previous Message Jordan Lewis 2022-07-11 13:18:34 Re: BUG #17542: tsquery returns incorrect results with nested, conjuncted followed-by operators