Re: Regression in query optimizer when upgrading from 9.4.5 to 9.5.2?

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: "Erdmann, Markus (at) Bellevue" <Markus(dot)Erdmann(at)cbre(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Regression in query optimizer when upgrading from 9.4.5 to 9.5.2?
Date: 2016-06-17 18:08:43
Message-ID: CAKFQuwZ5yyz0b-9EhSM4p0qyRFwAFt8qRR7UPVAHtBTnJ1YvJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Jun 16, 2016 at 8:40 PM, Erdmann, Markus @ Bellevue <
Markus(dot)Erdmann(at)cbre(dot)com> wrote:

> Hello,
>
> We’re trying to debug a performance issue affecting our staging database,
> and we’ve narrowed it down to a difference in the query optimizer in 9.5.2.
> Upgrading to 9.5 is important for us because we need the ability to import
> foreign schemas.
>
> This is the query we’re running:
>
> CREATE TEMP TABLE tmp_joined_transactions_9gkgptn5xcp9 ( transaction_id
> integer PRIMARY KEY );
>
> INSERT INTO tmp_joined_transactions_9gkgptn5xcp9 (transaction_id)
>

​This...

> SELECT DISTINCT ON ("transactions_transaction"."id")
> "transactions_transaction"."id"
>

​​DISTINCT is a code smell. DISTINCT ON less so - it helps to avoid
self-joins - but your inclusion of ON here is pointless since the only
output column is "id".

​As written there should be no way to get duplicate "id"s into the output
result. Or, if the tmp_joined_transactions relationship is 1-to-many you
should instead use a semi-join instead of an inner join.

FROM "transactions_transaction" , "tmp_joined_transactions_75chlsokrsev"
> WHERE
>

​Here...​

(NOT ("transactions_transaction"."id"
> IN (SELECT U0."id" AS Col1
> FROM "transactions_transaction" U0
> LEFT OUTER JOIN "transactions_commission" U1
> ON ( U0."id" = U1."transaction_id" )
> WHERE U1."id" IS NULL))
>

​Making this an anti-join (NOT EXISTS instead of NOT IN) stands out as an
easy improvement to try:

​It also makes the logic clearer since you seem to have a double-negative
here which means you really want a semi-join (which I wrote below)

WHERE EXISTS (SELECT 1 FROM transactions_commission WHERE
transactions_transaction.id = transactions_commission.transaction_id)

I won't promise this gives the same answer...I don't have enough spare
brain power or the ability to test it...but its seems correct.

AND "transactions_transaction"."date_created" >= '2010-01-01'::date
> AND "transactions_transaction"."date_created" <= '2015-12-31'::date
> AND "transactions_transaction"."deal_status" IN (1)
>

​Also...

> AND (transactions_transaction.id =
> tmp_joined_transactions_75chlsokrsev.transaction_id))
>

​This is style but I'm really a fan of using ANSI JOIN syntax...turning the
above into a <JOIN tmp_* ON​>

ORDER BY "transactions_transaction"."id" ASC;
>
>
​The regression itself someone else would need to comment on.

David J.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Paul Jungwirth 2016-06-17 18:09:28 Re: PostgresSQL and HIPAA compliance
Previous Message Durgamahesh Manne 2016-06-17 18:07:35 Re: Re: regarding schema only migration from sqlserver to postgres with runmtk.sh