Quick Links

Re: Query Performance / Planner estimate off

From:	Sushant Pawar <sushantxp(at)gmail(dot)com>
To:	Mats Julian Olsen <mats(at)duneanalytics(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Query Performance / Planner estimate off
Date:	2020-10-20 13:40:08
Message-ID:	CAF2=t-rkpraVWex94L-=W2OpiBBiOt22FccsMy1Az3Nay3VQhA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Looping in the main group ID.

Regards
Sushant

On Tue, Oct 20, 2020 at 6:49 PM Sushant Pawar <sushantxp(at)gmail(dot)com> wrote:

>
> On Tue, Oct 20, 2020 at 3:08 PM Mats Julian Olsen <mats(at)duneanalytics(dot)com>
> wrote:
>
>> Dear Postgres community,
>>
>> I'm looking for some help to manage queries against two large tables.
>>
>> Context:
>> We run a relatively large postgresql instance (5TB, 32 vCPU, 120GB RAM)
>> with a hybrid transactional/analytical workload. Data is written in batches
>> every 15 seconds or so, and the all queryable tables are append-only (we
>> never update or delete). Our users can run analytical queries on top of
>> these tables.
>>
>> We recently came across a series of troublesome queries one of which I'll
>> dive into here.
>>
>> Please see the following gist for both the query we run and the \d+
>> output: https://gist.github.com/mewwts/9f11ae5e6a5951593b8999559f5418cf.
>>
>> The tables in question are:
>> - `ethereum.transactions`: 833M rows, partitioned, 171M rows after WHERE
>> - `uniswap_v2."Pair_evt_Swap": 12M rows, not partitioned, 12M rows after
>> WHERE
>>
>> The crux of our issue is that the query planner chooses a nested loop
>> join for this query. Essentially making this query (and other queries) take
>> a very long time to complete. In contrast, by toggling `enable_nestloop`
>> and `enable_seqscan` off we can take the total runtime down from 16 minutes
>> to 2 minutes.
>>
>> 1) Vanilla plan (16 min) : https://explain.depesz.com/s/NvDR
>> 2) enable_nestloop=off (4 min): https://explain.depesz.com/s/buKK
>> 3) enable_nestloop=off; enable_seqscan=off (2 min):
>> https://explain.depesz.com/s/0WXx
>>
>
> The cost of a query while using the default Vanila plan is very less
> compared to the 3rd plan with nested loop and seqscan being set to off.
> As the JIT is enabled, it seems the planner tries to select the plan with
> the least cost and going for the plan which is taking more time of
> execution. Can you try running this query with JIT=off in the session and
> see if it selects the plan with the least time for execution?
>
>>
>> How can I get Postgres not to loop over 12M rows?
>>
>> Let me know if there is anything I left out here that would be useful for
>> further debugging.
>>
>> --
>> Regards
>>
> Sushant
>

In response to

Query Performance / Planner estimate off at 2020-10-20 09:37:42 from Mats Julian Olsen

Responses

Re: Query Performance / Planner estimate off at 2020-10-20 17:02:09 from Mats Olsen

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Mats Olsen	2020-10-20 14:50:02	Re: Query Performance / Planner estimate off
Previous Message	Victor Yegorov	2020-10-20 13:22:41	Re: Query Performance / Planner estimate off