From: | Martijn van Oosterhout <kleptog(at)svana(dot)org> |
---|---|
To: | Gregory Stark <stark(at)enterprisedb(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Recursive query syntax ambiguity |
Date: | 2007-02-05 16:18:23 |
Message-ID: | 20070205161823.GC4811@svana.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jan 29, 2007 at 01:38:02PM +0000, Gregory Stark wrote:
> Instead I suggest we create one type of reentrant node, which memoizes its
> output. It would be a kind of on-demand Materialize. It would be paired with a
> second node type which would be the only type of node which can invoke it.
> This RecursiveCall node would invoke the Memoize node using a special
> interface in which it passes enough state for the Memoize node to seek to the
> correct place in its output.
That I beleive is the right approach. I think an equivalent way of
looking at it is a "Loop" node with an InitPlan and a StepPlan.
Initially, the Loop node executes the InitPlan to get it's initial set
of tuples, storing them like a Materialize node does. After that it
keeps calling the StepNode, where somewhere inside the it has a node
that extracts a row from the aforementioned tuplestore. It stores these
returned tuples in the tuplestore also, thus giving you recursion.
<snip>
> (I've convinced myself that that's true but I should probably work out
> a good proof of it before I make all this depend on it.)
Yeah, proving it is going to be tricky, I'm not sure what the standard
says about infinite recursion.
> There are three general cases of the Memoize node. Breadth-first, Depth-first,
> and non-linearly-recursive.
I think the the only difference between depth and bredth-first searches
is (if you consider the tuplesort to be a queue) whether the new tuples
go to the front or the back of the list. But a data structures and
algorithms book will know this.
> There are a few open issues to consider. Namely, how to cost a RecursiveCall
> node.
One note: if you know that if you get p tuples out for every tuple in
(where p<1) then the asymptotic result of 1 + p + p*p+ ... is 1/(1-p)
However, I don't know it matters. You only need to cost the plan if
there are alternate paths and given the plan structure is strongly
constrained, I'm not sure how much it matters.
> Also, if a subplan has exactly one call-site we really ought to inline it as
> it will get much more reliable plans. Similarly if there are two call sites we
> should consider inlining it if the cost of materializing the results (and
> reading them back) is more than n-call-sites x the cost of executing the
> query. I would expect That would happen with plain sequential scans for
> example.
In the case where the subplan has side-effects, you can't optimise at
all. In the case of read-committed mode, will two seq-scans always
return the same result?
Hope this helps,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2007-02-05 16:23:58 | Re: Recursive query syntax ambiguity |
Previous Message | Gregory Stark | 2007-02-05 16:15:21 | Re: VC2005 build and pthreads |