Re: Planning performance problem (67626.278ms)

From: Jeremy Schneider <schnjere(at)amazon(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Krzysztof Plocharz <plocharz(at)9livesdata(dot)com>, "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: Planning performance problem (67626.278ms)
Date: 2019-04-08 23:10:17
Message-ID: b145b428-3351-ec86-d522-100a5bf8ded7@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 4/8/19 07:42, Justin Pryzby wrote:
> On Mon, Apr 08, 2019 at 04:33:34PM +0200, Pavel Stehule wrote:
>> po 8. 4. 2019 v 16:11 odesílatel Krzysztof Plocharz <plocharz(at)9livesdata(dot)com> napsal:
>>
>>> We have some very strange query planning problem. Long story short it
>>> takes 67626.278ms just to plan. Query execution takes 12ms.
>>>
>>> Query has 7 joins and 2 subselects.
>>> It looks like the issue is not deterministic, sometimes is takes few ms
>>> to plan the query.
>>>
>>> One of the tables has 550,485,942 live tuples and 743,504,012 dead
>>> tuples. Running ANALYZE on that tables solves the problem only temporarily.
>>>
>>> Question is how can we debug what is going on?
>>
>> please check your indexes against bloating. Planner get min and max from
>> indexes and this operation is slow on bloat indexes.
>
> I think that's from get_actual_variable_range(), right ?

For what it's worth, I have seen a similar issue on Aurora PG 9.6 where
query planning took a very long time (multiple minutes). In this
particular case, there wasn't anything Aurora-specific about the call to
get_actual_variable_range. We weren't able to distinctly identify the
root cause or build a reproducible test case -- but we suspect that an
inefficiency might exist in community PostgreSQL code.

For debugging, a few ideas:

1) capture a stack with pstack or perf record --call-graph

2) capture the execution plan of the SQL w slow planning

3) capture detailed stats for all relations and objects involved

4) capture the usual info for bug reporting (preface section in docs)

A reproducible test case is the gold standard; I'm keeping my eyes open
for another case too.

For the slow planning case that I saw, the slow process was almost
entirely in this call stack (captured with perf record --call-graph):
...
index_fetch_heap
index_getnext
get_actual_variable_range
ineq_histogram_selectivity
scalarineqsel
mergejoinscansel
initial_cost_mergejoin
try_mergejoin_path
add_paths_to_joinrel
make_join_rel
join_search_one_level
standard_join_search
make_one_rel
query_planner
...

-Jeremy

--
Jeremy Schneider
Database Engineer
Amazon Web Services

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Andres Freund 2019-04-08 23:26:44 Re: Planning performance problem (67626.278ms)
Previous Message David G. Johnston 2019-04-08 22:42:57 Re: Oracle to postgres migration