Re: Query Performance / Planner estimate off

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Mats Julian Olsen <mats(at)duneanalytics(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Query Performance / Planner estimate off
Date: 2020-10-20 10:50:05
Message-ID: CAFj8pRDj9hbhPKh6xF4s4Bas-dy8-EUC0hzaThConbBjYQ=QwA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

út 20. 10. 2020 v 11:59 odesílatel Mats Julian Olsen <mats(at)duneanalytics(dot)com>
napsal:

> On Tue, Oct 20, 2020 at 9:50 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
>> On Tue, 20 Oct 2020 at 22:38, Mats Julian Olsen <mats(at)duneanalytics(dot)com>
>> wrote:
>> >
>> > The crux of our issue is that the query planner chooses a nested loop
>> join for this query. Essentially making this query (and other queries) take
>> a very long time to complete. In contrast, by toggling `enable_nestloop`
>> and `enable_seqscan` off we can take the total runtime down from 16 minutes
>> to 2 minutes.
>> >
>> > 1) Vanilla plan (16 min) : https://explain.depesz.com/s/NvDR
>> > 2) enable_nestloop=off (4 min): https://explain.depesz.com/s/buKK
>> > 3) enable_nestloop=off; enable_seqscan=off (2 min):
>> https://explain.depesz.com/s/0WXx
>> >
>> > How can I get Postgres not to loop over 12M rows?
>>
>> You'll likely want to look at what random_page_cost is set to. If the
>> planner is preferring nested loops then it may be too low. You'll
>> also want to see if effective_cache_size is set to something
>> realistic. Higher values of that will prefer nested loops like this.
>>
>
> random_page_cost is 1.1 and effective_cache_size is '60GB' (listed in the
> gist). random_page_cost may be too low?
>

random_page_cost 2 is safer - the value 1.5 is a little bit aggressive for
me.

>
>> You may also want to reduce max_parallel_workers_per_gather. It looks
>> like you're not getting your parallel workers as often as you'd like.
>> If the planner chooses a plan thinking it's going to get some workers
>> and gets none, then that plan may be inferior the one that the planner
>> would have chosen if it had known the workers would be unavailable.
>>
>
> Interesting, here are the values for those:
> max_parallel_workers = 8
> max_parallel_workers_per_gather = 4
>
>
>>
>> > Let me know if there is anything I left out here that would be useful
>> for further debugging.
>>
>> select name,setting from pg_Settings where category like 'Query
>> Tuning%' and source <> 'default';
>> select version();
>>
>
> default_statistics_target = 500
> effective_cache_size = 7864320
> random_page_cost = 1.1
>
> PostgreSQL 12.2 (Ubuntu 12.2-2.pgdg19.10+1) on x86_64-pc-linux-gnu,
> compiled by gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, 64-bit
>
>>
>> would be useful.
>>
>> David
>>
>
> Thanks David, see above for more information.
>
> --
> Mats
> CTO @ Dune Analytics
> We're hiring: https://careers.duneanalytics.com
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mats Julian Olsen 2020-10-20 11:09:13 Re: Query Performance / Planner estimate off
Previous Message Mats Julian Olsen 2020-10-20 09:59:05 Re: Query Performance / Planner estimate off