Re: Unclamped row estimates whith OR-ed subplans

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Benjamin Coutu <ben(dot)coutu(at)zeyos(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Unclamped row estimates whith OR-ed subplans
Date: 2020-06-19 15:55:27
Message-ID: 7334b30adf19881431a344be6372d270327739ba.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, 2020-06-19 at 17:12 +0200, Benjamin Coutu wrote:
> please consider the following SQL query:
>
> SELECT * FROM "transactions" WHERE
> "account" IN (SELECT "ID" FROM "accounts" WHERE "name" ~~* '%test%') OR
> "contract" IN (SELECT "ID" FROM "contracts" WHERE "name" ~~* '%test%')
>
> This yields the following plan on Postgres 11:
>
> Seq Scan on transactions (cost=67.21..171458.03 rows=1301316 width=1206)
> Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
> SubPlan 1
> -> Bitmap Heap Scan on accounts (cost=33.36..61.16 rows=46 width=4)
> Recheck Cond: ((name)::text ~~* '%test%'::text)
> -> Bitmap Index Scan on s_accounts (cost=0.00..33.35 rows=46 width=0)
> Index Cond: ((name)::text ~~* '%test%'::text)
> SubPlan 2
> -> Seq Scan on contracts (cost=0.00..5.93 rows=5 width=4)
> Filter: ((name)::text ~~* '%test%'::text)
>
> So the where clause of this query has just two subplans OR-ed together, one is estimated to yield 46 rows and one is estimated to yield 5 rows.
> I'd expect the total rows for the seqscan to be estimated at 46 then, following the logic that rows_seqscan = max(rows_subplan1, rows_subplan2). As you can see, the optimizer estimates a whopping
> 1301316 rows instead.
>
> I am absolutely aware that those are hashed sub plans below a seqscan and that Postgres therefore has to scan all tuples of the table. But the problem is that upper nodes (which are excluded from
> this example for simplicity) think they will receive 1301316 rows from the seqscan, when in fact they will probably only see a hand full, which the planner could have (easily?) deduced by taking the
> greater of the two subplan row estimates.
>
> What am I missing, or is this perhaps a shortfall of the planner?

The subplans are executed *fpr each row* found in "transactions",
and the estimate on the subplans is *per execution".

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2020-06-19 16:04:47 Re: Unclamped row estimates whith OR-ed subplans
Previous Message Benjamin Coutu 2020-06-19 15:12:31 Unclamped row estimates whith OR-ed subplans