Quick Links

Re: Plans with bad estimates/paths

From:	"Joe Wildish" <joe(at)lateraljoin(dot)com>
To:	"Tomas Vondra" <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject:	Re: Plans with bad estimates/paths
Date:	2021-11-18 17:37:03
Message-ID:	56fa1c0c-feb5-441f-af48-792c4164ad06@www.fastmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hi Tomas,

On Tue, 16 Nov 2021, at 22:03, Tomas Vondra wrote:

> It sure smells like a case of correlated data for some clients but not
> others, but who knows ... Hard to say without seeing the nodes below the
> join. If the lower nodes are estimated accurately, then it's the join
> selectivity that is estimated poorly, and there's not much we can do
> about it :-(

Here is a link to a segment of a plan where the estimations degrade.
I've also pasted the plan segment at the end of this message.

https://gist.githubusercontent.com/joewildish/8eb66815d728399687b24647df941746/raw/ef8c62455dd34d76807feff5d164e6d8857014e2/gistfile1.txt

The nodes beneath the Merge Join seem to be estimated accurately (within 2x of the actual)? But, the Merge Join itself is a 10x under-estimate and the ones above that are even further out. You can see in the plan that this particular execution is for multiple clients (144, 145, 146 & 155). In my experiments, I am getting better results with a single client, although I don't know if that is down to the actual data size being smaller, or if the estimation for multiple values is inherently more inaccurate. Anyway, is this an example of a join selectivity problem?

> Do the "good" plans have the same underestimate? Maybe there's just much
> less data for those clients, and the "poor" plan ends up being fast anyway?

I think that may be happening but haven't been able to capture a plan yet that confirms it.

>> I am assuming this underestimation is the source of the planner
>> choosing the "wrong" path; in production, we have had to resort to
>> setting the join and from collapse limits to 1 to force a naive plan
>> to be generated. This is giving us execution times in the 10/20
>> second range vs. >45m in some cases.
>
> That may be happening, yes. So is it the join order that ends up being
> wrong, or the join methods?

I've seen both. For example, in the worst-performing plans, the root node has its first input estimated to produce 1 row and its second input estimated to produce c.40,000 rows. The second input is a SeqScan, presumably because of the single-row estimate of its sibling. Of course, the estimate of 1 turns out to be wildly inaccurate, the join produces c.2BN rows, and most are then filtered out.

In other (better) plans, the troublesome SeqScan doesn't exist: the relation in question gets joined lower down the tree, and it is not traversed by a SeqScan.

> Have you tried increasing the collapse limit
> instead? Although, if it works for some queries but not others, that's
> likely not going to help.

Yes but it often creates poorer plans rather than better plans. In fact, I increase those limits locally when testing, to get the planner to consistently produce what it thinks is the best plan. (I find without this, sub-paths can be materialised, presumably because one of the collapse limits has been hit. Obviously I'd rather remove this particular variability when trying to debug).

> The first thing I'd do is reduce the query size as much as possible. In
> this case I'd try removing as many joins as possible until the issue
> disappears. The simpler the query, the easier it is to investigate.
>
> And yes, replacing parts of a query with a temporary table is a common
> solution, because it's possible to collect statistics on it, build
> indexes etc. That usually solves estimation issues in multi-tenancy.
> Sometimes even a CTE with materialization is enough.

Thank you. It seems, then, that the solution lies in simplifying the queries such that the chances of poor estimation are reduced/removed. (I have had some success with this today. One of the queries was bringing in a view which resulted in needless self-joins). However, such a solution begs the question -- which bits of the query should be pre-computed? And, will such work survive further changes in the underlying data distributions?

Thanks,
-Joe

Plan segment:

Nested Loop Left Join (cost=2.29..348095.52 rows=3 width=93) (actual time=828.619..3368.362 rows=517367 loops=1)
-> Nested Loop (cost=1.86..348094.00 rows=3 width=81) (actual time=828.609..2655.136 rows=517367 loops=1)
Join Filter: (clients.id = order_items.client_id)
-> Nested Loop (cost=1.43..347875.73 rows=400 width=60) (actual time=828.603..1890.900 rows=517367 loops=1)
Join Filter: (clients.id = order_items_1.client_id)
-> Merge Join (cost=1.00..322712.24 rows=50370 width=48) (actual time=828.584..1224.993 rows=517367 loops=1)
Merge Cond: (order_items_2.client_id = clients.id)
-> GroupAggregate (cost=0.85..290718.67 rows=2518498 width=44) (actual time=0.040..1126.298 rows=1856351 loops=1)
Group Key: order_items_2.client_id, order_items_2.order_item_id
-> Merge Left Join (cost=0.85..240348.71 rows=2518498 width=18) (actual time=0.033..535.466 rows=1858076 loops=1)
Merge Cond: ((order_items_2.client_id = discount_allocations.client_id) AND (order_items_2.order_item_id = discount_allocations.order_item_id))
-> Index Only Scan using pk_order_items on order_items order_items_2 (cost=0.43..131145.90 rows=2518498 width=12) (actual time=0.022..150.068 rows=1856352 loops=1)
Heap Fetches: 0
-> Index Scan using pk_discount_allocations on discount_allocations (cost=0.42..89823.14 rows=931889 width=18) (actual time=0.008..110.223 rows=677531 loops=1)
-> Index Only Scan using pk_clients on clients (cost=0.14..8.64 rows=4 width=4) (actual time=0.003..0.011 rows=4 loops=1)
Index Cond: (id = ANY ('{144,145,146,155}'::integer[]))
Heap Fetches: 0
-> Index Only Scan using pk_order_items on order_items order_items_1 (cost=0.43..0.49 rows=1 width=12) (actual time=0.001..0.001 rows=1 loops=517367)
Index Cond: ((client_id = order_items_2.client_id) AND (order_item_id = order_items_2.order_item_id))
Heap Fetches: 0
-> Index Scan using pk_order_items on order_items (cost=0.43..0.53 rows=1 width=29) (actual time=0.001..0.001 rows=1 loops=517367)
Index Cond: ((client_id = order_items_1.client_id) AND (order_item_id = order_items_1.order_item_id))
-> Index Scan using pk_order_item_variants on order_item_variants (cost=0.43..0.51 rows=1 width=20) (actual time=0.001..0.001 rows=1 loops=517367)
Index Cond: ((client_id = order_items_1.client_id) AND (order_item_id = order_items_1.order_item_id))

In response to

Re: Plans with bad estimates/paths at 2021-11-16 22:03:35 from Tomas Vondra

Browse pgsql-general by date

	From	Date	Subject
Next Message	Дмитрий Иванов	2021-11-18 18:07:12	Re: pg_restore depending on user functions
Previous Message	Дмитрий Иванов	2021-11-18 16:58:51	Re: pg_restore depending on user functions