Re: Re: OLAP/reporting queries fall into nested loops over seq scans or other horrible planner choices

From: Gunther <raj(at)gusw(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Re: OLAP/reporting queries fall into nested loops over seq scans or other horrible planner choices
Date: 2017-11-03 14:36:57
Message-ID: 9a76f13e-cdb1-1d8d-2178-67e6dcf169bc@gusw.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, Nov 3, 2017 at 5:28 AM, Thomas Kellerer <spam_eater(at)gmx(dot)net> wrote:
>> I do like Oracle's approach with SQL profiles, where you can force the
>> optimizer to try harder to find a good execution plan. I _think_ it even
>> runs the statement with multiple plans and compares the expected outcome
>> with the actual values. Once a better plan is found that plan can be
>> attached to that query and the planner will use that plan with subsequent
>> executions.
I have used that approach with Oracle. I didn't like it. It is too
difficult, too complicated. Requires all sorts of DBA privileges.
Nothing that would help a lowly user trying his ad-hoc queries.

I think a "dynamic feedback plan optimization" would be more innovative
and ultimately deliver better on the original RDBMS vision. The RDBMS
should exert all intelligence that it can to optimize the query
execution. (I know that means: no reliance on hints.)

There is so much more that could be done, such as materialized and
potentially indexed partial results. (I know Oracle as materialized
partial results).

But the dynamic feedback plan would be even cooler.  So that means the
outer relation should be built or sampled to estimate the selectivity,
the inner relation should be built completely, and if it is too large,
it should be thrown back to the optimizer to change the plan.

Or may be the planner needs some second look pattern matching
criticizer: Any pattern of Nested Loop I would re-check and possibly
sample a few rows. And Nested Loop with costly inner loop should almost
always be avoided. Nested Loop of Seq Scan is a no-no unless it can be
proven that the cardinality of the inner relation to scan is less than 100.

But even more, once you have the inner and outer table of a Nested Loop
built or sampled, there should be no reason not to run the Hash Join. I
guess I still don't get why the optimizer even today would EVER consider
a Nested Loop over a Hash Join, unless there is some clear indication
that the query will be used to just get the FIRST ROWS (Oracle hint) and
that those first rows will actually exist (user waits 30 minutes at 100%
CPU only to be informed that the query has no results!), and that the
results are likely to come out early in the Nested Loop! So many
constraints to make that Nested Loop plan a successful strategy. Why
ever choose it???

I guess, hints or no hints, I think Nested Loops should not be used by
the optimizer unless it has positive indication  that it meets all the
criteria for being a good strategy, i.e., that there is a continuous
path of indexed columns starting with constant query parameters. This is
the usual OLTP query. And that is what Nested Loops are for. But in all
other cases, and if space allows at all, always use Hash Joins. It is
even cheaper to do a trial and error! Assume that space will allow, and
quit if it doesn't, rather than being sheepish and going to a 1 hour CPU
bound operation. Because if space does not allow, the chance for Nested
Loop being a good idea is also close to nil! So if space doesn't allow,
it would be Sort-Merge on Disk. Especially if the query has a DISTINCT
or ORDER BY clause anyway! Why is that not always a better strategy?

And yes, until all this is figured out: by all means include the
pg_hint_plan.c -- pretty please!

regards,
-Gunther

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Gunther 2017-11-03 14:51:31 Re: Re: OLAP/reporting queries fall into nested loops over seq scans or other horrible planner choices
Previous Message Chris Travers 2017-11-03 12:58:02 Re: proposal: schema variables