Re: More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters)

From: Noah Misch <noah(at)leadboat(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, david(at)fetter(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, jgh(at)wizmail(dot)org, itparanoia(at)gmail(dot)com, GavinFlower(at)archidevsys(dot)co(dot)nz, david(dot)g(dot)johnston(at)gmail(dot)com
Subject: Re: More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters)
Date: 2016-02-09 14:45:55
Message-ID: 20160209144555.GA187927@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 30, 2015 at 08:16:28PM +1300, David Rowley wrote:
> On [1] I suggested an idea to make improvements to the planner around the
> Equivalence Class code. Later in [2] Tom raised concerns with this adding
> too many planning cycles for a perhaps not common enough situation. I
> don't want to discuss that particular patch here, I want to discuss more
> generally about the dilemma about adding more smarts to the planner to
> allow it to generate a more optimal plan in order to save on execution time.

It's an important topic.

> A number of ideas were suggested on the other thread about how we might go
> about solving this problem. In [3] Simon talked about perhaps enabling
> extra optimisations when the planner sees that the plan will cost more than
> some given threshold. That's perhaps an option, but may not work well for
> optimisations which must take place very early in planning, for example [4].

Yeah. A slew of steps precede us assembling a notion of total cost. I bet
most controversial proposed steps will happen in that early period. We'd need
a rough cost earlier than we get it today, and I don't know what achieving
that would look like. Simon, did you have specifics in view?

> Another idea which came up was from Evgeniy [5], which was more of a
> request not to do it this way, but never-the-less, the idea was basically
> to add lots of GUCs to enable/disable each extra planner feature.

This and subsequent ideas each envision some kind of optimization step
blacklist. Suppose it's a bitmap with one bit per optional optimizer step.
Each idea chooses blacklist membership differently, but the planner acts on
the blacklist about the same way. I paraphrase the ideas in those terms
below, and I offer a couple more. For this idea, the blacklist is simple:

1. User defines the blacklist fully.

It's essentially handing the hard part back to the user. While I sympathize
with allergic reactions to this, I like it far better than rejecting
optimizations based on thinking the average case wants them disabled.
work_mem likewise isn't the ideal knob for any user, but it has a simple
implementation and beats "1MB per hash table is okay for everyone."

> Another option which I've thought about previously was a planner_strength
> GUC, at which various additional optimisations are enabled at various
> predefined strength levels, so that databases which tend to spend a great
> deal more execution time compared to planning time can turn this up a bit
> to see if that helps change that ratio a bit. This idea is far from

2. User selects from predefined blacklists like "maximum", "default", etc.

> perfect though, as who's to say that planner feature X should "kick in"
> before planner feature Y? I've also often thought that it might be nice to

Yep. It will be essentially impossible to build an argument to move a planner
feature from one strength to another. If the feature's committer has
personally experienced the problem in the field, the optimization will end up
active at default planner strength.

> have it so the planner does not modify the Parse object, so that the
> planner has the option to throw away what it's done so far and start
> planning all over again with the "planner_strength" knob turned up to the
> maximum, if the cost happened to indicate that the query was going to take
> a long time to execute.

3. System first plans with a specific predefined blacklist that omits
speculative (low probability, high payoff) steps. If that yields a
high-cost plan, it repeats planning with an empty blacklist.

I agree that the planner's modification of the Query tree is a substantial
roadblock. The design needs to work for one-shot plans, and we're unlikely to
recoup the cost of copying every one-shot Query tree.

> So here I'd very much like to kick off discussion on an acceptable way to
> solve this problem, in a realistic way which we're all happy with.

It's bad to add 10us per plan times 3000/s/client, but the same waste once per
minute per client is no cause for concern. I advise accepting speculative
planner optimizations, enabled by default when similar in cost to comparable
existing steps. At the same time, I encourage developing automation to cap
the waste when a client elicits millions of planner runs that don't benefit
from a certain optimization. Specific lines of inquiry:

4. Derive a conservative blacklist for each rewritten Query when first
planning it, and use that blacklist for subsequent plans.

Some prepared statements use a custom plan every time. (Constraint exclusion
is one driver of such cases.) Many facets of a Query's planning problem don't
depend on parameter values, so a particular optimization will apply to all the
custom plans or to none of them. Let the first plan build a blacklist of
provably-irrelevant optimizations, which the plan cache stores and furnishes
to later runs. The first plan after an invalidation recomputes the blacklist.

5. Add a facility that profiles a workload to generate blacklist data.

Think along the lines of gcc -fprofile-generate/-fprofile-use. Start a
profile; run your application load test; the server collects data sufficient
to match each query with its optimal blacklist. Issue "ALTER USER appuser SET
optimizer_profile = 'profname'" to adopt profile-driven blacklisting. This is
the automated equivalent of (1).

I recommend starting with (1), full user control over the blacklist via GUC.
Specifically, I recommend one GUC expressing the list rather than one Boolean
GUC per optional step. The core work of enumerating optional steps and making
the planner obey a blacklist applies toward all five ideas. A GUC to set that
blacklist is cheap, and it would be worth having as an escape hatch even if we
add sophistication later.

(2) is a dead end for the reason you gave. (3) has potential, but each of
(3), (4) and (5) is a big project with great uncertainty. They shouldn't
block introducing optimizer steps.

Thanks,
nm

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Verite 2016-02-09 14:54:03 Re: [patch] Proposal for \crosstabview in psql
Previous Message Daniel Verite 2016-02-09 14:40:52 Re: [patch] Proposal for \crosstabview in psql