Re: OLAP/reporting queries fall into nested loops over seq scans or other horrible planner choices

From: "ldh(at)laurent-hasson(dot)com" <ldh(at)laurent-hasson(dot)com>
To: Gunther <raj(at)gusw(dot)net>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: OLAP/reporting queries fall into nested loops over seq scans or other horrible planner choices
Date: 2017-11-02 02:44:06
Message-ID: BN6PR15MB142620448377EAA48AA8650D855C0@BN6PR15MB1426.namprd15.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


> -----Original Message-----
> From: pgsql-performance-owner(at)postgresql(dot)org [mailto:pgsql-performance-
> owner(at)postgresql(dot)org] On Behalf Of Gunther
> Sent: Wednesday, November 01, 2017 20:29
> To: pgsql-performance(at)postgresql(dot)org
> Subject: [PERFORM] OLAP/reporting queries fall into nested loops over seq
> scans or other horrible planner choices
>
> Hi, this is Gunther, have been with PgSQL for decades, on an off this list.
> Haven't been on for a long time making my way just fine. But there is one thing
> that keeps bothering me both with Oracle and PgSQL. And that is the
> preference for Nested Loops.
>
> Over the years the archives have questions about Nested Loops being chosen
> over Hash Joins. But the responses seem too specific to the people's queries,
> ask many questions, make them post the query plans, and often end up
> frustrating with suggestions to change the data model or to add an index and
> stuff like that.
>
> One should not have to go into that personal detail.
>
> There are some clear boundaries that a smart database should just never cross.
>
> Especially with OLAP queries. Think a database that is fine for OLTP, has
> indexes and the index based accesses for a few records joined with a dozen
> other tables all with indexes is no problem. If you fall into a Seq Scan scenario
> or unwanted Hash Join, you usually forgot to add an index or forgot to put index
> columns into your join or other constraints. Such are novice questions and we
> should be beyond that.
>
> But the issue is bulk searches, reports, and any analytic queries scenarios. In
> those queries Nested Loops are almost always a bad choice, even if there is an
> index. In over 20 years of working with RDBMs this has been my unfailing
> heuristics. A report runs slow? Look at plan, is there a Nested Loop? Yes?
> Squash it! And the report runs 10x faster instantaneously.
>
> So, all the more troublesome is if any database system (here PgSQL) would
> ever fall into a Nested Loop trap with CPU spinning at 100% for several
> minutes, with a Nested Loop body of anything from a Seq Scan or worse with a
> cardinality of anything over 10 or 100. Nested Loops of Nested Loops or Nested
> Loops of other complex query plan fragments should be a no-no and chosen
> only as an absolute last resort when the system cannot find enough memory,
> even then disk based merge sort should be better, i.e., Nested Loops should
> never be chosen. Period.
>
> If you can set enable_nestloop off and the Hash Join is chosen and the
> performance goes from 1 hour of 100% CPU to 10 seconds completion time,
> then something is deadly wrong. And it doesn't matter to me if I should have
> re-written my query in some funny ways or tweaked my data model, these are
> all unacceptable options when you have a complex system with hybrid
> OLTP/OLAP uses. Don't tell me to de-normalize. I know I can materialize joins
> in tables which I can then use again in joins to save time. But that is not the
> point here.
>
> And I don't think tweaking optimizer statistics is the solution either.
> Because optimizer statistics quickly become worthless when your criteria get
> more complex.
>
> The point is that Nested Loops should never be chosen except in index lookup
> situations or may be memory constraints.
>
> How can I prevent it on a query by query scope? I cannot set enable_nestloop =
> off because one query will be for a full report, wile another one might have
> indexed constraints running in the same session, and I don't want to manage
> side effects and remember to set enable_nestloop parameter on and off.
>
> There must be a way to tell the optimizer to penalize nested loops to make
> them the last resort. In Oracle there are those infamous hints, but they don't
> always work either (or it is easy to make mistakes that you get no feedback
> about).
>
> Is there any chance PgSQL can get something like a hint feature? Or is there a
> way to use postgresql.conf to penalize nested loops so that they would only ever
> be chosen in the most straight-forward situations as with query parameters
> that are indexed? I know I need to have sufficient work_mem, but if you can set
> enable_nestloop = off and you get the desired Hash Join, there is obviously
> sufficient work_mem, so that isn't the answer either.
>
> Thanks for listening to my rant.
>
> regards,
> -Gunther
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

[Laurent Hasson]
Hello Gunther,

Just adding to your voice. I recently experienced the same issue with a complex multi-table view, including pivots, and was surprised to see all the nested loops everywhere in spite of indices being available. I spent a lot of time optimizing the query and went from about 1h to about 3mn, but penalizing nested loops in favor of other "joining" techniques seem to make sense as a strategy. Either that, or there is something I really don't understand here either and would love to be educated :)

Laurent.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Laurenz Albe 2017-11-02 08:30:28 Re: OLAP/reporting queries fall into nested loops over seq scans or other horrible planner choices
Previous Message Gunther 2017-11-02 00:28:41 OLAP/reporting queries fall into nested loops over seq scans or other horrible planner choices