Suspicious Estimated Number of Returned Rows

From: Ba Jinsheng <bajinsheng(at)u(dot)nus(dot)edu>
To: "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Cc: Manuel Rigger <rigger(at)nus(dot)edu(dot)sg>
Subject: Suspicious Estimated Number of Returned Rows
Date: 2023-05-30 03:19:28
Message-ID: SG2PR06MB2810355511BEA44B808B32178A4B9@SG2PR06MB2810.apcprd06.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi everyone,

Consider the two queries below:

```sql
CREATE TABLE t0(c0 INT);
EXPLAIN SELECT * FROM t0 WHERE t0.c0 IN (t0.c0); --2537
QUERY PLAN
------------------------------------------------------
Seq Scan on t0 (cost=0.00..35.50 rows=2537 width=4)
Filter: (c0 IS NOT NULL)
(2 rows)

EXPLAIN SELECT * FROM t0 WHERE (t0.c0 IN (t0.c0))OR(t0.c0 > 0); --858
QUERY PLAN
-----------------------------------------------------
Seq Scan on t0 (cost=0.00..48.25 rows=858 width=4)
Filter: ((c0 = c0) OR (c0 > 0))
(2 rows)
```

Based on my understanding, the second query should logically return a superset of the results obtained from the first query. However, when examining the query plans, I noticed that the estimated number of rows returned in the second query is significantly lower than that in the first query (2537 vs. 858, respectively). This issue can also be reproduced on a table with data after executing `ANALYZE`, but with a smaller estimate. I think this is because in the first query, `t0.c0 IN (t0.c0)` is optimized to a `NULL` check, while the optimization is not performed in the subexpression of the subquery. Is this expected, or is this a missed optimization that could be addressed?

I used this version of PostgreSQL:
Commit: 503b0556d96f2c8df6ed91c5a8cf11b23f37ce6d

Best regards,
Jinsheng Ba

Browse pgsql-bugs by date

  From Date Subject
Next Message Jaime Casanova 2023-05-30 04:56:09 Re: Server crash with parallel workers with Postgres 14.7
Previous Message Thomas Munro 2023-05-30 01:29:09 Re: BUG #17949: Adding an index introduces serialisation anomalies.