Re: BUG #13112: Catastrophic performance degradation without DISTINCT ON statement

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: oyvind(dot)harboe(at)zylin(dot)com
Cc: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13112: Catastrophic performance degradation without DISTINCT ON statement
Date: 2015-04-21 15:28:20
Message-ID: CAKFQuwaCMGeeLQzikV2F55dec4Wf9EAZrDaEAx-O48XpiAQnrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Apr 21, 2015 at 12:16 AM, <oyvind(dot)harboe(at)zylin(dot)com> wrote:

> The following bug has been logged on the website:
>
> Bug reference: 13112
> Logged by: Øyvind Harboe
> Email address: oyvind(dot)harboe(at)zylin(dot)com
> PostgreSQL version: 9.1.0
> Operating system: Ubuntu
> Description:
>
> I've been testing out PostgreSQL vs. Derby/MS SQL for our application when
> I
> ran into a problem where the performance of PostgreSQL went from great to
> abysmal for no apparent reason.
>
> After a bit of digging, I've found that the problem is with the SQL
> statement that Apache Cayenne generates.
>
> Apache Cayenne generates statements of the following form which yields bad
> performance on PostgreSQL with complicated WHERE statements and numerous
> columns:
>
> 1) SELECT DISTINCT a,b,c,d,e,f ... WHERE somecomplicatedstatement
>
> If I rewrite this statement to the form below using the 'DISTINCT ON()'
> syntax(which is PostgreSQL specific dialect), then I get great performance
> again:
>
> 2) SELECT DISTINCT ON(a) a,b,c,d,e,f ... WHERE somecomplicatedstatement
>
> Numbers on my machine:
>
> 1) 44000ms
>
> 2) 4300ms
>
> Here's where I read up on the DISTINCT ON syntax:
> http://www.postgresql.org/docs/9.4/static/sql-select.html

So what - the two queries are not equivalent.

There may be room for improvement here but you've given insufficient
information to help anyone who cares to dig deeper.

see ​https://wiki.postgresql.org/wiki/SlowQueryQuestions

Distinct requires sorting - the ON clause just limits how much - so using
it in place of writing a correct query will always yield sub-optimal
results.

This is not a bug even if PostgreSQL could be taught to handle these kinds
of ORM queries more effectively.

David J.

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Kevin Grittner 2015-04-21 21:16:01 Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
Previous Message oyvind.harboe 2015-04-21 07:16:22 BUG #13112: Catastrophic performance degradation without DISTINCT ON statement