read-only planner input

From: Neil Conway <neilc(at)samurai(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: read-only planner input
Date: 2005-03-18 04:50:31
Message-ID: 423A5E17.9080506@samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've been taking a look at how to stop the planner from scribbling on
its input. This is my first modification of any significance to the
planner, so don't hesitate to tell me what I've gotten wrong :)

I think the planner makes two kinds of modifications to the input Query:
(a) rewriting of the Query to improve planning (b) as a convenient place
to store planner working state. Some examples of the former include
transforming IN clauses to joins, transforming simple FROM-clause
subselects into joins, preprocessing expressions, and so forth. Examples
of the latter are mostly the "internal to planner" fields denoted in the
Query struct definition.

(b) should be pretty easy to solve; we can create a per-Query PlanState
struct that contains this information, as well as holding a pointer to
the Query (and perhaps the in-construct Plan tree).

I'm still trying to figure out how to handle (a). Perhaps we can create
an additional plan node that always sits at the top of the plan tree.
This would hold derivations of data from the input Query. A lot of the
code that implements (a) is actually already applicative in nature, but
any code that modifies a Query destructively would need to be changed.
In other words, rather than

query->jointree = pull_up_subqueries(parse, query->jointree);

We'd have:

top_plan_node->jointree = pull_up_subqueries(plan_state,
query->jointree);

(Possibly passing PlanState rather than `parse', which is a Query, if
needed. The example is also somewhat simplified.)

BTW, I wonder whether it would be possible to move some preprocessing
from the early stages of the planner to a "preprocessing" phase that
would run after the rewriter but before the planner proper. The
preprocessor would maintain the essential properties of the input Query,
but it wouldn't need to be re-run when the query is replanned due to a
modification to a dependent database object. For example, the decision
about whether to pull-up a subquery could be done once and not redone in
subsequent invocations of the planner on the same Query. On the other
hand, I'm not sure how much preprocessing could be rearranged like this,
and since replanning ought to be relatively rare, I'm not sure it's
worth spending a whole lot of time trying to optimize it...

Comments welcome.

-Neil

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Qingqing Zhou 2005-03-18 05:20:02 Re: Query crashes/hangs server
Previous Message Bruce Momjian 2005-03-18 04:30:21 Query crashes/hangs server