Unclamped row estimates whith OR-ed subplans

From: "Benjamin Coutu" <ben(dot)coutu(at)zeyos(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Unclamped row estimates whith OR-ed subplans
Date: 2020-06-19 15:12:31
Message-ID: 20200619151232.7F3285FB26@mx.zeyos.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello,

please consider the following SQL query:

SELECT * FROM "transactions" WHERE
"account" IN (SELECT "ID" FROM "accounts" WHERE "name" ~~* '%test%') OR
"contract" IN (SELECT "ID" FROM "contracts" WHERE "name" ~~* '%test%')

This yields the following plan on Postgres 11:

Seq Scan on transactions (cost=67.21..171458.03 rows=1301316 width=1206)
Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
SubPlan 1
-> Bitmap Heap Scan on accounts (cost=33.36..61.16 rows=46 width=4)
Recheck Cond: ((name)::text ~~* '%test%'::text)
-> Bitmap Index Scan on s_accounts (cost=0.00..33.35 rows=46 width=0)
Index Cond: ((name)::text ~~* '%test%'::text)
SubPlan 2
-> Seq Scan on contracts (cost=0.00..5.93 rows=5 width=4)
Filter: ((name)::text ~~* '%test%'::text)

So the where clause of this query has just two subplans OR-ed together, one is estimated to yield 46 rows and one is estimated to yield 5 rows.
I'd expect the total rows for the seqscan to be estimated at 46 then, following the logic that rows_seqscan = max(rows_subplan1, rows_subplan2). As you can see, the optimizer estimates a whopping 1301316 rows instead.

I am absolutely aware that those are hashed sub plans below a seqscan and that Postgres therefore has to scan all tuples of the table. But the problem is that upper nodes (which are excluded from this example for simplicity) think they will receive 1301316 rows from the seqscan, when in fact they will probably only see a hand full, which the planner could have (easily?) deduced by taking the greater of the two subplan row estimates.

What am I missing, or is this perhaps a shortfall of the planner?

Thanks,

Ben

--

Bejamin Coutu
ben(dot)coutu(at)zeyos(dot)com

ZeyOS GmbH & Co. KG
http://www.zeyos.com

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Laurenz Albe 2020-06-19 15:55:27 Re: Unclamped row estimates whith OR-ed subplans
Previous Message Laurenz Albe 2020-06-17 09:23:52 Re: simple query running for ever