Re: select distinct runs slow on pg 10.6

From: Dinesh Somani <dinesh(at)opsveda(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: yash mehta <yash215(at)gmail(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: select distinct runs slow on pg 10.6
Date: 2019-09-11 16:38:19
Message-ID: CAGcTZwVzzN9dC+97z-HxpHXCgAHdjEPg+Q9M3gAArX3wTgSrJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I think Merlin has outlined pretty much all the options and very neatly.
(As an asides Merlin could you possibly elaborate on the "C Hack" how that
might be accomplished.)

To OP, I am curious if the performance changes were the query rewritten
such that all timestamp columns were listed first in the selection. I
understand it might not be feasible to make this change in your real
application without breaking the contract.

Regards
Dinesh

On Wed, Sep 11, 2019 at 8:54 AM Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:

> On Mon, Sep 9, 2019 at 3:55 AM yash mehta <yash215(at)gmail(dot)com> wrote:
> >
> > We have a query that takes 1min to execute in postgres 10.6 and the same
> executes in 4 sec in Oracle database. The query is doing 'select distinct'.
> If I add a 'group by' clause, performance in postgres improves
> significantly and fetches results in 2 sec (better than oracle). But
> unfortunately, we cannot modify the query. Could you please suggest a way
> to improve performance in Postgres without modifying the query.
>
> Well, here's the bad news. Postgres doesn't optimize this specific
> formulation as well as oracle does. Normally tweaking the query along
> with some creativity would get the expected result; it's pretty rare
> that I can't coerce the planner to do something fairly optimally. I'm
> guessing this is an Oracle conversion app, and we do not have the
> ability to change the underlying source code? Can you elaborate why
> not?
>
> In lieu of changing the query in the application, we have high level
> strategies to consider.
> *) Eat the 20 seconds, and gripe to your oracle buddies (they will
> appreciate this)
>
> *) Mess around with with planner variables to get a better plan.
> Unfortunately, since we can't do tricks like SET before running the
> query, the changes will be global, and I'm not expecting this to bear
> fruit, unless we can have this query be separated from other queries
> at the connection level (we might be able to intervene on connect and
> set influential non-global planner settings there)
>
> *) Experiment with pg11/pg12 to see if upcoming versions can handle
> this strategy better. pg12 is in beta obviously, but an upgrade
> strategy would be the easiest out.
>
> *) Attempt to intervene with views. I think this is out, since all
> the tables are schema qualified. To avoid a global change, the typical
> strategy is to tuck some views into a private schema and manipulate
> search_path to have them resolve first, but that won't work if you
> don't have control of the query string.
>
> *) Try to change the query string anyways. Say, this is a compiled
> application for which you don't have the code, we might be able to
> locate the query text within the compiled binary and modify it. This
> is actually a pretty effective trick (although in many scenarios we'd
> want the query string to be the same length as before but you have
> plenty of whitespace to play with) although in certain
> legal/regulatory contexts we might not be able to do it.
>
> *) Hack some C to adjust the query in flight. This is *SUPER* hacky,
> but let's say that the application was dynamically linked against the
> libpq driver, but with some C change and a fearless attitude we could
> adjust the query after it leaves the application but before it hits
> the database. Other candidate interventions might be in the database
> itself or in pgbouncer. We could also do this in jdbc if your
> application connects via that driver. This is would be 'absolutely
> last resort' tactics, but sometimes you simply must find a solution.
>
> merlin
>
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Rick Otten 2019-09-11 17:57:25 Re: select distinct runs slow on pg 10.6
Previous Message Merlin Moncure 2019-09-11 15:54:01 Re: select distinct runs slow on pg 10.6