Re: Query performance issue

From: yudhi s <learnerdatabase99(at)gmail(dot)com>
To: Greg Sabino Mullane <htamfids(at)gmail(dot)com>, jross(at)openvistas(dot)net
Cc: pgsql-general(at)lists(dot)postgresql(dot)org, hjp-pgsql(at)hjp(dot)at
Subject: Re: Query performance issue
Date: 2024-10-22 20:36:18
Message-ID: CAEzWdqc4jA3D4RX4u3fgHYc37W5h2q=3ucef0yT3ppeB-9MBTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Oct 23, 2024 at 12:32 AM Greg Sabino Mullane <htamfids(at)gmail(dot)com>
wrote:

> To be frank, there is so much wrong with this query that it is hard to
> know where to start. But a few top items:
>
> * Make sure all of the tables involved have been analyzed. You might want
> to bump default_statistics_target up and see if that helps.
>
> * As mentioned already, increase work_mem, as you have things spilling to
> disk (e.g. external merge Disk: 36280kB)
>
> * Don't use the "FROM table1, table2, table3" syntax but use "FROM table1
> JOIN table2 ON (...) JOIN table3 ON (...)
>
> * Try not to use subselects. Things like WHERE x IN (SELECT ...) are
> expensive and hard to optimize.
>
> * You have useless GROUP BY clauses in there. Remove to simplify the query
>
> * There is no LIMIT. Does the client really need all 135,214 rows?
>
>
>
I tried running the query by removing both the "group by" from the inner
subqueries (I think the initial thought was that they will give distinct
records to the outer query and will thus help), and added limit 500 at the
last and also set the work_mem to 2GB for that session before running the
query. But seeing the response increased to ~5 seconds (from ~3.1 seconds
earlier). Below I have updated the execution plan for the same at the last
section.

https://gist.github.com/databasetech0073/746353a9e76d5e29b2fc6abdc80cdef8

Again , not able to clearly understand the third point you said below. Can
you please clarify a bit more. Do you mean we should write it as exists
/not exists rather IN and NOT IN and that will improve the performance? I
hope the third point doesn't matter much as we have all equijoin used here.
Correct me if I'm wrong.

*"Try not to use subselects. Things like WHERE x IN (SELECT ...) are
expensive and hard to optimize*."

Additionally in the plan which mysql makes and showing the highest response
time, is it suffering because of differences of the speed of the underlying
IO/storage or is it just because of the optimization features which are
available in postgres and not there in mysql ? Trying to understand if it
can be identified from the execution plan itself.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Laurenz Albe 2024-10-22 20:47:11 Re: Query performance issue
Previous Message David G. Johnston 2024-10-22 20:06:20 Re: Query performance issue