OLAP/reporting queries fall into nested loops over seq scans or other horrible planner choices

From: Gunther <raj(at)gusw(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: OLAP/reporting queries fall into nested loops over seq scans or other horrible planner choices
Date: 2017-11-02 00:28:41
Message-ID: f1b85c37-e4b5-c158-5f67-a780a2191e20@gusw.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi, this is Gunther, have been with PgSQL for decades, on an off this
list. Haven't been on for a long time making my way just fine. But there
is one thing that keeps bothering me both with Oracle and PgSQL. And
that is the preference for Nested Loops.

Over the years the archives have questions about Nested Loops being
chosen over Hash Joins. But the responses seem too specific to the
people's queries, ask many questions, make them post the query plans,
and often end up frustrating with suggestions to change the data model
or to add an index and stuff like that.

One should not have to go into that personal detail.

There are some clear boundaries that a smart database should just never
cross.

Especially with OLAP queries. Think a database that is fine for OLTP,
has indexes and the index based accesses for a few records joined with a
dozen other tables all with indexes is no problem. If you fall into a
Seq Scan scenario or unwanted Hash Join, you usually forgot to add an
index or forgot to put index columns into your join or other
constraints. Such are novice questions and we should be beyond that.

But the issue is bulk searches, reports, and any analytic queries
scenarios. In those queries Nested Loops are almost always a bad choice,
even if there is an index. In over 20 years of working with RDBMs this
has been my unfailing heuristics. A report runs slow? Look at plan, is
there a Nested Loop? Yes? Squash it! And the report runs 10x faster
instantaneously.

So, all the more troublesome is if any database system (here PgSQL)
would ever fall into a Nested Loop trap with CPU spinning at 100% for
several minutes, with a Nested Loop body of anything from a Seq Scan or
worse with a cardinality of anything over 10 or 100. Nested Loops of
Nested Loops or Nested Loops of other complex query plan fragments
should be a no-no and chosen only as an absolute last resort when the
system cannot find enough memory, even then disk based merge sort should
be better, i.e., Nested Loops should never be chosen. Period.

If you can set enable_nestloop off and the Hash Join is chosen and the
performance goes from 1 hour of 100% CPU to 10 seconds completion time,
then something is deadly wrong. And it doesn't matter to me if I should
have re-written my query in some funny ways or tweaked my data model,
these are all unacceptable options when you have a complex system with
hybrid OLTP/OLAP uses. Don't tell me to de-normalize. I know I can
materialize joins in tables which I can then use again in joins to save
time. But that is not the point here.

And I don't think tweaking optimizer statistics is the solution either.
Because optimizer statistics quickly become worthless when your criteria
get more complex.

The point is that Nested Loops should never be chosen except in index
lookup situations or may be memory constraints.

How can I prevent it on a query by query scope? I cannot set
enable_nestloop = off because one query will be for a full report, wile
another one might have indexed constraints running in the same session,
and I don't want to manage side effects and remember to set
enable_nestloop parameter on and off.

There must be a way to tell the optimizer to penalize nested loops to
make them the last resort. In Oracle there are those infamous hints, but
they don't always work either (or it is easy to make mistakes that you
get no feedback about).

Is there any chance PgSQL can get something like a hint feature? Or is
there a way to use postgresql.conf to penalize nested loops so that they
would only ever be chosen in the most straight-forward situations as
with query parameters that are indexed? I know I need to have sufficient
work_mem, but if you can set enable_nestloop = off and you get the
desired Hash Join, there is obviously sufficient work_mem, so that isn't
the answer either.

Thanks for listening to my rant.

regards,
-Gunther

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message ldh@laurent-hasson.com 2017-11-02 02:44:06 Re: OLAP/reporting queries fall into nested loops over seq scans or other horrible planner choices
Previous Message Merlin Moncure 2017-11-01 16:28:12 Re: Cursor vs Set Operation