Re: Query Performance / Planner estimate off

From: Mats Olsen <mats(at)duneanalytics(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query Performance / Planner estimate off
Date: 2020-10-20 16:45:36
Message-ID: 60498453-2278-5f62-7ff6-343eea7bfd3b@duneanalytics.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 10/20/20 11:37 AM, Mats Julian Olsen wrote:

> Dear Postgres community,
>
> I'm looking for some help to manage queries against two large tables.
>
> Context:
> We run a relatively large postgresql instance (5TB, 32 vCPU, 120GB
> RAM) with a hybrid transactional/analytical workload. Data is written
> in batches every 15 seconds or so, and the all queryable tables are
> append-only (we never update or delete). Our users can run analytical
> queries on top of these tables.
>
> We recently came across a series of troublesome queries one of which
> I'll dive into here.
>
> Please see the following gist for both the query we run and the \d+
> output:
> https://gist.github.com/mewwts/9f11ae5e6a5951593b8999559f5418cf
> <https://gist.github.com/mewwts/9f11ae5e6a5951593b8999559f5418cf>.
>
> The tables in question are:
> - `ethereum.transactions`: 833M rows, partitioned, 171M rows after WHERE
> - `uniswap_v2."Pair_evt_Swap": 12M rows, not partitioned, 12M rows
> after WHERE

The query plans I submitted was querying the table
`uniswap_v2."Pair_evt_Mint"`which has 560k rows before and after WHERE.
Also not partitioned. Apologies for the inconsistency, but as I
mentioned the same performance problem holds when using
`uniswap_v2."Pair_evt_Swap" (even worse due to it's size).

>
> The crux of our issue is that the query planner chooses a nested loop
> join for this query. Essentially making this query (and other queries)
> take a very long time to complete. In contrast, by toggling
> `enable_nestloop` and `enable_seqscan` off we can take the total
> runtime down from 16 minutes to 2 minutes.
>
> 1) Vanilla plan (16 min) : https://explain.depesz.com/s/NvDR
> <https://explain.depesz.com/s/NvDR>
> 2) enable_nestloop=off (4 min): https://explain.depesz.com/s/buKK
> <https://explain.depesz.com/s/buKK>
> 3) enable_nestloop=off; enable_seqscan=off (2 min):
> https://explain.depesz.com/s/0WXx <https://explain.depesz.com/s/0WXx>
>
> How can I get Postgres not to loop over 12M rows?
>
> Let me know if there is anything I left out here that would be useful
> for further debugging.
>
> --
> Mats
> CTO @ Dune Analytics
> We're hiring: https://careers.duneanalytics.com
> <https://careers.duneanalytics.com>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Victor Yegorov 2020-10-20 16:51:39 Re: Query Performance / Planner estimate off
Previous Message Mats Olsen 2020-10-20 14:50:02 Re: Query Performance / Planner estimate off