Re: Query Performance / Planner estimate off

From: Mats Olsen <mats(at)duneanalytics(dot)com>
To: Sushant Pawar <sushantxp(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query Performance / Planner estimate off
Date: 2020-10-20 17:02:09
Message-ID: b424307b-013d-7256-0198-6fed5c00edca@duneanalytics.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


On 10/20/20 3:40 PM, Sushant Pawar wrote:
> Looping in the main group ID.
>
> Regards
> Sushant
>
> On Tue, Oct 20, 2020 at 6:49 PM Sushant Pawar <sushantxp(at)gmail(dot)com
> <mailto:sushantxp(at)gmail(dot)com>> wrote:
>
>
> On Tue, Oct 20, 2020 at 3:08 PM Mats Julian Olsen
> <mats(at)duneanalytics(dot)com <mailto:mats(at)duneanalytics(dot)com>> wrote:
>
> Dear Postgres community,
>
> I'm looking for some help to manage queries against two large
> tables.
>
> Context:
> We run a relatively large postgresql instance (5TB, 32 vCPU,
> 120GB RAM) with a hybrid transactional/analytical workload.
> Data is written in batches every 15 seconds or so, and the all
> queryable tables are append-only (we never update or delete).
> Our users can run analytical queries on top of these tables.
>
> We recently came across a series of troublesome queries one of
> which I'll dive into here.
>
> Please see the following gist for both the query we run and
> the \d+ output:
> https://gist.github.com/mewwts/9f11ae5e6a5951593b8999559f5418cf
> <https://gist.github.com/mewwts/9f11ae5e6a5951593b8999559f5418cf>.
>
>
> The tables in question are:
> - `ethereum.transactions`: 833M rows, partitioned, 171M rows
> after WHERE
> - `uniswap_v2."Pair_evt_Swap": 12M rows, not partitioned, 12M
> rows after WHERE
>
> The crux of our issue is that the query planner chooses a
> nested loop join for this query. Essentially making this query
> (and other queries) take a very long time to complete. In
> contrast, by toggling `enable_nestloop` and `enable_seqscan`
> off we can take the total runtime down from 16 minutes to 2
> minutes.
>
> 1) Vanilla plan (16 min) : https://explain.depesz.com/s/NvDR
> <https://explain.depesz.com/s/NvDR>
> 2) enable_nestloop=off (4 min):
> https://explain.depesz.com/s/buKK
> <https://explain.depesz.com/s/buKK>
> 3) enable_nestloop=off; enable_seqscan=off (2 min):
> https://explain.depesz.com/s/0WXx
> <https://explain.depesz.com/s/0WXx>
>
>
> The cost of a query while using the default Vanila plan is very
> less compared to the 3rd plan with nested loop and seqscan  being
> set to off.  As the JIT is enabled, it seems the planner tries to
> select the plan with the least cost and going for the plan which
> is taking more time of execution. Can you try running this query
> with JIT=off in the session and see if it selects the plan with
> the least time for execution?
>
Thank you for your reply. Here's the result using set jit=off;
https://explain.depesz.com/s/rpKc. It's essentially the same plan as the
initial one.

>
> How can I get Postgres not to loop over 12M rows?
>
> Let me know if there is anything I left out here that would be
> useful for further debugging.
>
> --
> Regards
>
>     Sushant
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Mats Olsen 2020-10-20 17:40:40 Re: Query Performance / Planner estimate off
Previous Message Victor Yegorov 2020-10-20 16:51:39 Re: Query Performance / Planner estimate off