Re: Slow query and wrong row estimates for CTE

From: Dane Foster <studdugie(at)gmail(dot)com>
To: Michael Lewis <mlewis(at)entrata(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: psql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Slow query and wrong row estimates for CTE
Date: 2021-02-17 16:51:10
Message-ID: CA+WxinJ=_9JeGq=+am9yrX6Xdcc+uKQhgL-bzs4n-NtP-GFksg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

A small update (see below/inline).

On Tue, Feb 16, 2021 at 2:11 PM Dane Foster <studdugie(at)gmail(dot)com> wrote:

> Short conclusion:
> Switching from CTEs to temporary tables and analyzing reduced the runtime
> from 15 minutes to about 1.5 minutes.
>
>
> Longer conclusion:
>
> @Justin Pryzby <pryzby(at)telsasoft(dot)com>
>
> - I experimented w/ materializing the CTEs and it helped at the
> margins but did not significantly contribute to a reduction in runtime.
> - No clustering was required because once I switched to temporary
> tables the new plan no longer used the for_upsert index.
>
> @Michael Lewis <mlewis(at)entrata(dot)com>
>
> - Increasing work_mem to 100MB (up from 4MB) helped at the margins
> (i.e., some 100's of millisecond improvement) but did not represent a
> significant reduction in the runtime.
> - It wasn't obvious to me which window function would be appropriate
> for the problem I was trying to solve therefore I didn't experiment w/ that
> approach.
>
> I want to update/correct this statement:

>
> - The selectivity of score_name='student_performance_index' was not
> enough for the planner to choose an index over doing a FTS.
>
> I added a partial index (WHERE
score_name='student_performance_index'::citext) and that had a *dramatic*
impact. That part of the query went from ~12 seconds to ~1 second.

> Finally, thank you both for helping me bring this poor performing query to
> heel. Your insights were helpful and greatly appreciated.
>
> Sincerely,
>
> Dane
>
>
> On Tue, Feb 16, 2021 at 10:25 AM Dane Foster <studdugie(at)gmail(dot)com> wrote:
>
>>
>> On Tue, Feb 16, 2021 at 10:13 AM Michael Lewis <mlewis(at)entrata(dot)com>
>> wrote:
>>
>>> Sort Method: external
>>>> merge Disk: 30760kB
>>>> Worker 0: Sort
>>>> Method: external merge Disk: 30760kB
>>>> Worker 1: Sort
>>>> Method: external merge Disk: 30760kB
>>>>
>>>
>>> If you can increase work_mem, even setting it temporarily higher for the
>>> session or transaction, that may dramatically change the plan.
>>>
>> I will try increasing work_mem for the session later today.
>>
>>> The advice given by Justin particularly about row estimates would be
>>> wise to pursue.
>>>
>>
>>
>>> I'd wonder how selective that condition of score_name =
>>> 'student_performance_index' is in filtering out many of the 9.3 million
>>> tuples in that table and if an index with that as the leading column, or
>>> just an index on that column would be helpful.
>>>
>> There are 1,206,355 rows where score_name='student_performance_idex'.
>>
>>> You'd need to look at pg_stats for the table and see how many distinct
>>> values, and if student_performance_index is relatively high or low (or not
>>> present) in the MCVs list.
>>>
>> I will look into that.
>>
>>
>>> I am not sure if your query does what you want it to do as I admit I
>>> didn't follow your explanation of the desired behavior. My hunch is that
>>> you want to make use of a window function and get rid of one of the CTEs.
>>>
>> If you could tell me what part(s) are unclear I would appreciate it so
>> that I can write a better comment.
>>
>> Thank you sooo much for all the feedback. It is *greatly* appreciated!
>> Sincerely,
>>
>> Dane
>>
>>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Yoan SULTAN 2021-02-17 18:37:29 Re: Slow query and wrong row estimates for CTE
Previous Message Dane Foster 2021-02-16 19:11:07 Re: Slow query and wrong row estimates for CTE