From: | Dane Foster <studdugie(at)gmail(dot)com> |
---|---|
To: | Michael Lewis <mlewis(at)entrata(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com> |
Cc: | psql-performance <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Slow query and wrong row estimates for CTE |
Date: | 2021-02-16 19:11:07 |
Message-ID: | CA+Wxin+srgWYi0n=vnO3hfov9DMNxb2PbSQvYuQPEGHBcrCiCw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Short conclusion:
Switching from CTEs to temporary tables and analyzing reduced the runtime
from 15 minutes to about 1.5 minutes.
Longer conclusion:
@Justin Pryzby <pryzby(at)telsasoft(dot)com>
- I experimented w/ materializing the CTEs and it helped at the margins
but did not significantly contribute to a reduction in runtime.
- No clustering was required because once I switched to temporary tables
the new plan no longer used the for_upsert index.
@Michael Lewis <mlewis(at)entrata(dot)com>
- Increasing work_mem to 100MB (up from 4MB) helped at the margins
(i.e., some 100's of millisecond improvement) but did not represent a
significant reduction in the runtime.
- It wasn't obvious to me which window function would be appropriate for
the problem I was trying to solve therefore I didn't experiment w/ that
approach.
- The selectivity of score_name='student_performance_index' was not
enough for the planner to choose an index over doing a FTS.
Finally, thank you both for helping me bring this poor performing query to
heel. Your insights were helpful and greatly appreciated.
Sincerely,
Dane
On Tue, Feb 16, 2021 at 10:25 AM Dane Foster <studdugie(at)gmail(dot)com> wrote:
>
> On Tue, Feb 16, 2021 at 10:13 AM Michael Lewis <mlewis(at)entrata(dot)com> wrote:
>
>> Sort Method: external
>>> merge Disk: 30760kB
>>> Worker 0: Sort
>>> Method: external merge Disk: 30760kB
>>> Worker 1: Sort
>>> Method: external merge Disk: 30760kB
>>>
>>
>> If you can increase work_mem, even setting it temporarily higher for the
>> session or transaction, that may dramatically change the plan.
>>
> I will try increasing work_mem for the session later today.
>
>> The advice given by Justin particularly about row estimates would be wise
>> to pursue.
>>
>
>
>> I'd wonder how selective that condition of score_name =
>> 'student_performance_index' is in filtering out many of the 9.3 million
>> tuples in that table and if an index with that as the leading column, or
>> just an index on that column would be helpful.
>>
> There are 1,206,355 rows where score_name='student_performance_idex'.
>
>> You'd need to look at pg_stats for the table and see how many distinct
>> values, and if student_performance_index is relatively high or low (or not
>> present) in the MCVs list.
>>
> I will look into that.
>
>
>> I am not sure if your query does what you want it to do as I admit I
>> didn't follow your explanation of the desired behavior. My hunch is that
>> you want to make use of a window function and get rid of one of the CTEs.
>>
> If you could tell me what part(s) are unclear I would appreciate it so
> that I can write a better comment.
>
> Thank you sooo much for all the feedback. It is *greatly* appreciated!
> Sincerely,
>
> Dane
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Dane Foster | 2021-02-17 16:51:10 | Re: Slow query and wrong row estimates for CTE |
Previous Message | Pavel Stehule | 2021-02-16 17:46:13 | Re: proposal: schema variables |