Re: Slow Query

From: Venkata Balaji N <nag1010(at)gmail(dot)com>
To: robbyc <robcampbell73(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow Query
Date: 2015-08-12 06:08:46
Message-ID: CAEyp7J9SCCpZEbR0y6Xq-rq5yVALAnaOF=aJzOqv=Vj4C8vxYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Aug 12, 2015 at 3:29 PM, robbyc <robcampbell73(at)gmail(dot)com> wrote:

> Hi Venkata,
>
> work_mem was set to 72MB, increased to 144MB, no change.
>

Increasing work_mem depends on various other factors like Table size
(amount of data being sorted), available memory etc.

> Added an index of type varchar_pattern_ops to Vacancy.JobTitle, this did
> not help either.
>

Sorry, I did not mean to say that an Index must be added straight away. The
column must be eligible to have an Index. Meaning, Index will be
beneficial if created on a column with high number of distinct values.

If either of the above does not help, then options to rewrite the query
must be explored.

Thanks,
Venkata Balaji N

Fujitsu Australia

>
> On Wed, Aug 12, 2015 at 2:09 PM, Venkata Balaji N [via PostgreSQL] <[hidden
> email] <http:///user/SendEmail.jtp?type=node&node=5861850&i=0>> wrote:
>
>> On Wed, Aug 12, 2015 at 12:34 PM, robbyc <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=5861839&i=0>> wrote:
>>
>>> Hi,
>>>
>>> I am new to optimizing queries and i'm getting a slow running time
>>> (~1.5secs) with the following SQL:
>>>
>>> SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle",
>>> "Vacancy"."DateCreated", "Vacancy"."CustomAccess"
>>> , "Department"."Name" as "Department", list("Occupation"."Name") as
>>> "Occupation", "Vacancy"."PositionNo"
>>> , "Vacancy"."Template" from
>>> "Vacancy"
>>> LEFT JOIN "CategoryOption_TableRow" as "c_22"
>>> ON ("c_22"."RowID" = "Vacancy"."ID"
>>> and "c_22"."Category_TableID" = 22)
>>> LEFT JOIN "CategoryOption" as "Occupation"
>>> ON ("Occupation"."ID" = "c_22"."CategoryOptionID")
>>> LEFT JOIN "TableRow_TableRow" as "t_33"
>>> ON ("t_33"."Table1RowID" = "Vacancy"."ID"
>>> and "t_33"."Table_TableID" = 33 )
>>> LEFT JOIN "Department"
>>> ON ("Department"."ID" = "t_33"."Table2RowID" and
>>> "Department"."Active" = 't' and "Department"
>>> ."ClientID" = 263)
>>> JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"=
>>> 50
>>> and "c_50"."RowID" = "Vacancy"
>>> ."ID" and "c_50"."CategoryOptionID"=19205)
>>> WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT
>>> DISTINCT("Vacancy"."ID")
>>> FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on
>>> ("ct126"."Category_TableID" = 126
>>> and "RowID" = "Vacancy"."ID")
>>> left join "Workflow" on ("Workflow"."VacancyID" =
>>> "Vacancy"."ID"
>>> and "Workflow"."Level"
>>> = 1)
>>> left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" =
>>> "Workflow"."ID" and "c30"."Category_TableID"
>>> = 30 and "c30"."CategoryOptionID" = 21923)
>>> WHERE "Template" AND "ct126"."CategoryOptionID"
>>> IN(34024,35254,35255,35256)) and "Vacancy"
>>> ."Template" = 't'
>>> GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated",
>>> "Vacancy"."CustomAccess", "Department"
>>> ."Name", "Vacancy"."PositionNo", "Vacancy"."Template"
>>> UNION
>>> SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle",
>>> "Vacancy"."DateCreated", "Vacancy"."CustomAccess"
>>> , "Department"."Name" as "Department", list("Occupation"."Name") as
>>> "Occupation", "Vacancy"."PositionNo"
>>> , "Vacancy"."Template" from
>>> "Vacancy"
>>> LEFT JOIN "CategoryOption_TableRow" as "c_22"
>>> ON ("c_22"."RowID" = "Vacancy"."ID"
>>> and "c_22"."Category_TableID" = 22)
>>> LEFT JOIN "CategoryOption" as "Occupation"
>>> ON ("Occupation"."ID" = "c_22"."CategoryOptionID")
>>> LEFT JOIN "TableRow_TableRow" as "t_33"
>>> ON ("t_33"."Table1RowID" = "Vacancy"."ID"
>>> and "t_33"."Table_TableID" = 33 )
>>> LEFT JOIN "Department"
>>> ON ("Department"."ID" = "t_33"."Table2RowID" and
>>> "Department"."Active" = 't' and "Department"
>>> ."ClientID" = 263)
>>> JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"=
>>> 50
>>> and "c_50"."RowID" = "Vacancy"
>>> ."ID" and "c_50"."CategoryOptionID"=19205)
>>> WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT
>>> DISTINCT("Vacancy"."ID")
>>> FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on
>>> ("ct126"."Category_TableID" = 126
>>> and "RowID" = "Vacancy"."ID")
>>> left join "Workflow" on ("Workflow"."VacancyID" =
>>> "Vacancy"."ID"
>>> and "Workflow"."Level"
>>> = 1)
>>> left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" =
>>> "Workflow"."ID" and "c30"."Category_TableID"
>>> = 30 and "c30"."CategoryOptionID" = 21923)
>>> WHERE "Template" AND "ct126"."CategoryOptionID"
>>> IN(34024,35254,35255,35256)) and "Vacancy"
>>> ."Template" <> 't' AND "Vacancy"."Level" = 1
>>> GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated",
>>> "Vacancy"."CustomAccess", "Department"
>>> ."Name", "Vacancy"."PositionNo", "Vacancy"."Template"
>>> ORDER BY "JobTitle"
>>>
>>> Running explain analyze gives me the following information:
>>> http://explain.depesz.com/s/pdC <http://explain.depesz.com/s/pdC>
>>
>>
>>> For a total runtime: 2877.157 ms
>>>
>>> If i remove the left joins on Department and TableRow_TableRow this
>>> reduces
>>> the run time by about a third.
>>> Additionally removing CategoryOption and CategoryOption_TableRow joins
>>> further reduces by a about a third.
>>>
>>> Given that i need both these joins for the information retrieved by them,
>>> what would be the best way to re-factor this query so it runs faster?
>>>
>>> Looking at the output of explain analyze the hash aggregates and sort
>>> seem
>>> to be the primary issue.
>>>
>>
>> The query has got a distinct and group-by/order-by clauses which seems to
>> be taking time. Without looking at much details of the query code and Table
>> size etc, did you try increasing the work_mem and then execute the query
>> and see if that helps ? This will reduce the on-disk IO for sorting. Also,
>> Vacancy.JobTitle seems to be a non-index column.
>>
>> Regards,
>> Venkata Balaji
>>
>> Fujitsu Australia
>>
>>
>>
>> ------------------------------
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://postgresql.nabble.com/Slow-Query-tp5861835p5861839.html
>> To unsubscribe from Slow Query, click here.
>> NAML
>> <http://postgresql.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
>
> --
> Regards
>
> Robert Campbell
> +61412062971
> [hidden email] <http:///user/SendEmail.jtp?type=node&node=5861850&i=1>
>
> ------------------------------
> View this message in context: Re: Slow Query
> <http://postgresql.nabble.com/Slow-Query-tp5861835p5861850.html>
>
> Sent from the PostgreSQL - performance mailing list archive
> <http://postgresql.nabble.com/PostgreSQL-performance-f2050081.html> at
> Nabble.com.
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Pietro Pugni 2015-08-12 07:00:45 Re: Query Plan Performance on Partitioned Table
Previous Message robbyc 2015-08-12 05:29:45 Re: Slow Query