From: | "Joe Wildish" <joe(at)lateraljoin(dot)com> |
---|---|
To: | "Tomas Vondra" <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: Plans with bad estimates/paths |
Date: | 2021-11-18 17:37:03 |
Message-ID: | 56fa1c0c-feb5-441f-af48-792c4164ad06@www.fastmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi Tomas,
On Tue, 16 Nov 2021, at 22:03, Tomas Vondra wrote:
> It sure smells like a case of correlated data for some clients but not
> others, but who knows ... Hard to say without seeing the nodes below the
> join. If the lower nodes are estimated accurately, then it's the join
> selectivity that is estimated poorly, and there's not much we can do
> about it :-(
Here is a link to a segment of a plan where the estimations degrade.
I've also pasted the plan segment at the end of this message.
The nodes beneath the Merge Join seem to be estimated accurately (within 2x of the actual)? But, the Merge Join itself is a 10x under-estimate and the ones above that are even further out. You can see in the plan that this particular execution is for multiple clients (144, 145, 146 & 155). In my experiments, I am getting better results with a single client, although I don't know if that is down to the actual data size being smaller, or if the estimation for multiple values is inherently more inaccurate. Anyway, is this an example of a join selectivity problem?
> Do the "good" plans have the same underestimate? Maybe there's just much
> less data for those clients, and the "poor" plan ends up being fast anyway?
I think that may be happening but haven't been able to capture a plan yet that confirms it.
>> I am assuming this underestimation is the source of the planner
>> choosing the "wrong" path; in production, we have had to resort to
>> setting the join and from collapse limits to 1 to force a naive plan
>> to be generated. This is giving us execution times in the 10/20
>> second range vs. >45m in some cases.
>
> That may be happening, yes. So is it the join order that ends up being
> wrong, or the join methods?
I've seen both. For example, in the worst-performing plans, the root node has its first input estimated to produce 1 row and its second input estimated to produce c.40,000 rows. The second input is a SeqScan, presumably because of the single-row estimate of its sibling. Of course, the estimate of 1 turns out to be wildly inaccurate, the join produces c.2BN rows, and most are then filtered out.
In other (better) plans, the troublesome SeqScan doesn't exist: the relation in question gets joined lower down the tree, and it is not traversed by a SeqScan.
> Have you tried increasing the collapse limit
> instead? Although, if it works for some queries but not others, that's
> likely not going to help.
Yes but it often creates poorer plans rather than better plans. In fact, I increase those limits locally when testing, to get the planner to consistently produce what it thinks is the best plan. (I find without this, sub-paths can be materialised, presumably because one of the collapse limits has been hit. Obviously I'd rather remove this particular variability when trying to debug).
> The first thing I'd do is reduce the query size as much as possible. In
> this case I'd try removing as many joins as possible until the issue
> disappears. The simpler the query, the easier it is to investigate.
>
> And yes, replacing parts of a query with a temporary table is a common
> solution, because it's possible to collect statistics on it, build
> indexes etc. That usually solves estimation issues in multi-tenancy.
> Sometimes even a CTE with materialization is enough.
Thank you. It seems, then, that the solution lies in simplifying the queries such that the chances of poor estimation are reduced/removed. (I have had some success with this today. One of the queries was bringing in a view which resulted in needless self-joins). However, such a solution begs the question -- which bits of the query should be pre-computed? And, will such work survive further changes in the underlying data distributions?
Thanks,
-Joe
Plan segment:
Nested Loop Left Join (cost=2.29..348095.52 rows=3 width=93) (actual time=828.619..3368.362 rows=517367 loops=1)
-> Nested Loop (cost=1.86..348094.00 rows=3 width=81) (actual time=828.609..2655.136 rows=517367 loops=1)
Join Filter: (clients.id = order_items.client_id)
-> Nested Loop (cost=1.43..347875.73 rows=400 width=60) (actual time=828.603..1890.900 rows=517367 loops=1)
Join Filter: (clients.id = order_items_1.client_id)
-> Merge Join (cost=1.00..322712.24 rows=50370 width=48) (actual time=828.584..1224.993 rows=517367 loops=1)
Merge Cond: (order_items_2.client_id = clients.id)
-> GroupAggregate (cost=0.85..290718.67 rows=2518498 width=44) (actual time=0.040..1126.298 rows=1856351 loops=1)
Group Key: order_items_2.client_id, order_items_2.order_item_id
-> Merge Left Join (cost=0.85..240348.71 rows=2518498 width=18) (actual time=0.033..535.466 rows=1858076 loops=1)
Merge Cond: ((order_items_2.client_id = discount_allocations.client_id) AND (order_items_2.order_item_id = discount_allocations.order_item_id))
-> Index Only Scan using pk_order_items on order_items order_items_2 (cost=0.43..131145.90 rows=2518498 width=12) (actual time=0.022..150.068 rows=1856352 loops=1)
Heap Fetches: 0
-> Index Scan using pk_discount_allocations on discount_allocations (cost=0.42..89823.14 rows=931889 width=18) (actual time=0.008..110.223 rows=677531 loops=1)
-> Index Only Scan using pk_clients on clients (cost=0.14..8.64 rows=4 width=4) (actual time=0.003..0.011 rows=4 loops=1)
Index Cond: (id = ANY ('{144,145,146,155}'::integer[]))
Heap Fetches: 0
-> Index Only Scan using pk_order_items on order_items order_items_1 (cost=0.43..0.49 rows=1 width=12) (actual time=0.001..0.001 rows=1 loops=517367)
Index Cond: ((client_id = order_items_2.client_id) AND (order_item_id = order_items_2.order_item_id))
Heap Fetches: 0
-> Index Scan using pk_order_items on order_items (cost=0.43..0.53 rows=1 width=29) (actual time=0.001..0.001 rows=1 loops=517367)
Index Cond: ((client_id = order_items_1.client_id) AND (order_item_id = order_items_1.order_item_id))
-> Index Scan using pk_order_item_variants on order_item_variants (cost=0.43..0.51 rows=1 width=20) (actual time=0.001..0.001 rows=1 loops=517367)
Index Cond: ((client_id = order_items_1.client_id) AND (order_item_id = order_items_1.order_item_id))
From | Date | Subject | |
---|---|---|---|
Next Message | Дмитрий Иванов | 2021-11-18 18:07:12 | Re: pg_restore depending on user functions |
Previous Message | Дмитрий Иванов | 2021-11-18 16:58:51 | Re: pg_restore depending on user functions |