From: | PG Bug reporting form <noreply(at)postgresql(dot)org> |
---|---|
To: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Cc: | rpsalmi(at)gmail(dot)com |
Subject: | BUG #17545: Incorrect selectivity for IS NOT DISTINCT FROM and NULLs |
Date: | 2022-07-11 14:18:44 |
Message-ID: | 17545-a0ca4de888953169@postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 17545
Logged by: Roope Salmi
Email address: rpsalmi(at)gmail(dot)com
PostgreSQL version: 14.4
Operating system: Ubuntu 22.04
Description:
Hi,
It appears that selectivity for "IS NOT DISTINCT FROM" clauses regard the
proportion of NULLs in the same way as "=", leading to polar opposite row
count
estimates.
In practice this causes nested loop joins to be used when that is not
appropriate. Not sure if this is a known issue, but I was unable to find
any
previous mentions of it.
Here I create a table with one column and 1000 rows of NULL. A cartesian
product
with "WHERE x.a = y.a" correctly estimates that there are around zero
matching
rows. "x.a IS NOT DISTINCT FROM y.a" incorrectly gives the same estimate,
whereas "x.a = y.a OR (x.a IS NULL AND y.a IS NULL)", which should be
equivalent, gives the correct 1000000.
postgres=# CREATE TABLE test(a INTEGER);
CREATE TABLE
postgres=# INSERT INTO test(a) SELECT NULL FROM generate_series(1, 1000);
INSERT 0 1000
postgres=# ANALYZE test;
ANALYZE
postgres=# EXPLAIN SELECT FROM test x, test y WHERE x.a = y.a;
QUERY PLAN
----------------------------------------------------------------------
Merge Join (cost=127.66..137.67 rows=1 width=0)
Merge Cond: (x.a = y.a)
-> Sort (cost=63.83..66.33 rows=1000 width=4)
Sort Key: x.a
-> Seq Scan on test x (cost=0.00..14.00 rows=1000 width=4)
-> Sort (cost=63.83..66.33 rows=1000 width=4)
Sort Key: y.a
-> Seq Scan on test y (cost=0.00..14.00 rows=1000 width=4)
(8 rows)
postgres=# EXPLAIN SELECT FROM test x, test y
postgres=# WHERE x.a IS NOT DISTINCT FROM y.a;
QUERY PLAN
----------------------------------------------------------------------
Nested Loop (cost=0.00..15030.50 rows=1 width=0)
Join Filter: (NOT (x.a IS DISTINCT FROM y.a))
-> Seq Scan on test x (cost=0.00..14.00 rows=1000 width=4)
-> Materialize (cost=0.00..19.00 rows=1000 width=4)
-> Seq Scan on test y (cost=0.00..14.00 rows=1000 width=4)
(5 rows)
postgres=# EXPLAIN SELECT FROM test x, test y
postgres-# WHERE x.a = y.a OR (x.a IS NULL AND y.a IS NULL);
QUERY PLAN
----------------------------------------------------------------------
Nested Loop (cost=0.00..15030.50 rows=1000000 width=0)
Join Filter: ((x.a = y.a) OR ((x.a IS NULL) AND (y.a IS NULL)))
-> Seq Scan on test x (cost=0.00..14.00 rows=1000 width=4)
-> Materialize (cost=0.00..19.00 rows=1000 width=4)
-> Seq Scan on test y (cost=0.00..14.00 rows=1000 width=4)
(5 rows)
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2022-07-11 15:20:21 | BUG #17546: power() function - value is distorted via automatic type cast |
Previous Message | Jordan Lewis | 2022-07-11 13:18:34 | Re: BUG #17542: tsquery returns incorrect results with nested, conjuncted followed-by operators |