Re: Slow query and wrong row estimates for CTE

From: Dane Foster <studdugie(at)gmail(dot)com>
To: Michael Lewis <mlewis(at)entrata(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: psql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Slow query and wrong row estimates for CTE
Date: 2021-02-16 19:11:07
Message-ID: CA+Wxin+srgWYi0n=vnO3hfov9DMNxb2PbSQvYuQPEGHBcrCiCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Short conclusion:
Switching from CTEs to temporary tables and analyzing reduced the runtime
from 15 minutes to about 1.5 minutes.

Longer conclusion:

@Justin Pryzby <pryzby(at)telsasoft(dot)com>

- I experimented w/ materializing the CTEs and it helped at the margins
but did not significantly contribute to a reduction in runtime.
- No clustering was required because once I switched to temporary tables
the new plan no longer used the for_upsert index.

@Michael Lewis <mlewis(at)entrata(dot)com>

- Increasing work_mem to 100MB (up from 4MB) helped at the margins
(i.e., some 100's of millisecond improvement) but did not represent a
significant reduction in the runtime.
- It wasn't obvious to me which window function would be appropriate for
the problem I was trying to solve therefore I didn't experiment w/ that
approach.
- The selectivity of score_name='student_performance_index' was not
enough for the planner to choose an index over doing a FTS.

Finally, thank you both for helping me bring this poor performing query to
heel. Your insights were helpful and greatly appreciated.

Sincerely,

Dane

On Tue, Feb 16, 2021 at 10:25 AM Dane Foster <studdugie(at)gmail(dot)com> wrote:

>
> On Tue, Feb 16, 2021 at 10:13 AM Michael Lewis <mlewis(at)entrata(dot)com> wrote:
>
>> Sort Method: external
>>> merge Disk: 30760kB
>>> Worker 0: Sort
>>> Method: external merge Disk: 30760kB
>>> Worker 1: Sort
>>> Method: external merge Disk: 30760kB
>>>
>>
>> If you can increase work_mem, even setting it temporarily higher for the
>> session or transaction, that may dramatically change the plan.
>>
> I will try increasing work_mem for the session later today.
>
>> The advice given by Justin particularly about row estimates would be wise
>> to pursue.
>>
>
>
>> I'd wonder how selective that condition of score_name =
>> 'student_performance_index' is in filtering out many of the 9.3 million
>> tuples in that table and if an index with that as the leading column, or
>> just an index on that column would be helpful.
>>
> There are 1,206,355 rows where score_name='student_performance_idex'.
>
>> You'd need to look at pg_stats for the table and see how many distinct
>> values, and if student_performance_index is relatively high or low (or not
>> present) in the MCVs list.
>>
> I will look into that.
>
>
>> I am not sure if your query does what you want it to do as I admit I
>> didn't follow your explanation of the desired behavior. My hunch is that
>> you want to make use of a window function and get rid of one of the CTEs.
>>
> If you could tell me what part(s) are unclear I would appreciate it so
> that I can write a better comment.
>
> Thank you sooo much for all the feedback. It is *greatly* appreciated!
> Sincerely,
>
> Dane
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Dane Foster 2021-02-17 16:51:10 Re: Slow query and wrong row estimates for CTE
Previous Message Pavel Stehule 2021-02-16 17:46:13 Re: proposal: schema variables