Re: Query takes around 15 to 20 min over 20Lakh rows

From: Michael Lewis <mlewis(at)entrata(dot)com>
To: Shubham Mittal <mittalshubham30(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, PostgreSQL General <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Query takes around 15 to 20 min over 20Lakh rows
Date: 2021-09-07 15:27:07
Message-ID: CAHOFxGpLa+BnSrfCb1xA9xKubu213W-r2bBBfszKAAH8kgZHzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Have you ever used this site to visualize the explain plan and spot bad
estimates and slow nodes? https://explain.depesz.com/s/WE1R

This stands out to me-

*Subquery Scan on npiactionjoin (cost=10,165,289.40..10,167,192.01 rows=293
width=16) (actual time=118,413.432..118,806.684 rows=446,782
loops=1)Filter: (npiactionjoin.rn = 1)*

It seems that estimate is pretty far off and this node and the final node
above this are the biggest slowdowns. If you filtered down to the record
you want from task_history BEFORE the join, then maybe you would have
quicker results. I might try a materialized CTE or even an analyzed temp
table if that option is available to you, so the planner makes informed
decisions.

By the way, the order by on that row_number seems like you are getting the
OLDEST activity related to the task which could maybe be cached rather than
re-calculated daily as this query runs.

*Michael Lewis | Database Engineer*
*Entrata*

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ninad Shah 2021-09-08 04:07:15 Re: prevent WAL replication to fill filesystem
Previous Message Raul Kaubi 2021-09-07 14:52:52 Re: How to log bind values for statements that produce errors