Re: OLAP/reporting queries fall into nested loops over seq scans or other horrible planner choices

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Gunther <raj(at)gusw(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: OLAP/reporting queries fall into nested loops over seq scans or other horrible planner choices
Date: 2017-11-02 08:30:28
Message-ID: 1509611428.3268.5.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Gunther wrote:
> But there
> is one thing that keeps bothering me both with Oracle and PgSQL. And
> that is the preference for Nested Loops.

[...]

> But the issue is bulk searches, reports, and any analytic queries
> scenarios. In those queries Nested Loops are almost always a bad choice,
> even if there is an index. In over 20 years of working with RDBMs this
> has been my unfailing heuristics. A report runs slow? Look at plan, is
> there a Nested Loop? Yes? Squash it! And the report runs 10x faster
> instantaneously.

[...]

> If you can set enable_nestloop off and the Hash Join is chosen and the
> performance goes from 1 hour of 100% CPU to 10 seconds completion time,
> then something is deadly wrong.

[...]

> The point is that Nested Loops should never be chosen except in index
> lookup situations or may be memory constraints.
>
> How can I prevent it on a query by query scope? I cannot set
> enable_nestloop = off because one query will be for a full report, wile
> another one might have indexed constraints running in the same session,
> and I don't want to manage side effects and remember to set
> enable_nestloop parameter on and off.
>
> There must be a way to tell the optimizer to penalize nested loops to
> make them the last resort. In Oracle there are those infamous hints, but
> they don't always work either (or it is easy to make mistakes that you
> get no feedback about).
>
> Is there any chance PgSQL can get something like a hint feature?

PostgreSQL doesn't have a way to tell if a query is an OLAP query
running against a star schema or a regular OLTP query, it will treat
both in the same fashion.

I also have had to deal with wrongly chosen nested loop joins, and
testing a query with "enable_nestloop=off" is one of the first things
to try in my experience.

However, it is not true that PostgreSQL "perfers nested loops".
Sometimes a nested loop join is the only sane and efficient way to
process a query, and removing that capability would be just as
bad a disaster as you are experiencing with your OLAP queries.

Bad choices are almost always caused by bad estimates.
Granted, there is no way that estimates can ever be perfect.

So what could be done?

One pragmatic solution would be to wrap every query that you know
to be an OLAP query with

BEGIN;
SET LOCAL enable_nestloop=off;
SELECT ...
COMMIT;

Looking deeper, I would say that wrongly chosen nested loop joins
often come from an underestimate that is close to zero.
PostgreSQL already clamps row count estimates to 1, that is, it will
choose an estimate of 1 whenever it thinks fewer rows will be returned.

Perhaps using a higher clamp like 2 would get rid of many of your
problems, but it is a difficult gamble as it will also prevent some
nested loop joins that would have been the best solution.

Finally, even though the official line of PostgreSQL is to *not* have
query hints, and for a number of good reasons, this is far from being
an unanimous decision. The scales may tip at some point, though I
personally hope that this point is not too close.

Yours,
Laurenz Albe

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2017-11-02 12:35:54 Re: proposal: schema variables
Previous Message ldh@laurent-hasson.com 2017-11-02 02:44:06 Re: OLAP/reporting queries fall into nested loops over seq scans or other horrible planner choices