Quick Links

Re: Parameterized-path cost comparisons need some work

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Parameterized-path cost comparisons need some work
Date:	2012-04-17 18:46:23
Message-ID:	CA+TgmoZfOxNrnMJp8NgL08Y78-fh7TMCxDtCxDpoF6VqLFAXpg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Apr 17, 2012 at 12:14 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I've been hacking away on a patch to do this, and attached is something
> that I think is pretty close to committable. It needs another going-over
> and some new regression test cases, but it seems to work, and it fixes a
> number of things besides the above-mentioned issue. In particular, this
> has a much more principled approach than HEAD does to the problem of where
> to place parameterizable join clauses in the plan tree; that can be seen
> in the one change in the existing regression tests, where we no longer
> generate a redundant upper-level copy of an OR join clause that the old
> code wasn't bright enough to get rid of.
>
> The patch is a bit large because I chose to revise the data representation.
> Instead of each Path having its own required_outer, rows, and
> param_clauses fields, now a parameterized Path has a pointer to a
> ParamPathInfo struct that it shares with other Paths for the same rel and
> the same parameterization. This guarantees that such paths will have the
> same estimated rowcount, because we only compute that once per
> parameterization, which should save some work as well as making the world
> safe for add_path_precheck.

Seems reasonable.

> The only place where this approach proved a bit tricky was in handling
> AppendPaths and MergeAppendPaths, which didn't surprise me because that
> was a rough spot for the old way too (and indeed they aren't handled
> completely correctly in HEAD). A parameterized path is now *required*
> to enforce all clauses that the join clause movement rules assign to it;
> but Append and MergeAppend don't do qual checking, and I didn't feel like
> changing that. The method that I have settled on is to require all child
> paths of a parameterized append to have the exact same parameterization,
> IOW we push the qual checks down below the append. Now the interesting
> point about that is that we want to support Appends wherein some children
> are seqscans and some are indexscans (consider a partitioned table where
> the parent is a dummy empty table with no indexes). The "raw" situation
> there is that we'll have a plain seqscan path for the parent and then a
> collection of similarly-parameterized indexscan paths for the live
> partition children. To make it possible to convert that case into a
> parameterized append path, I added parameterization support to seqscans
> and then wrote "reparameterize_path", which changes a Path to increase
> its parameterization level (and thereby assign it more pushed-down join
> clauses to check at runtime). That allows us to reconfigure the seqscan
> to match the other children. I've also added such support to
> SubqueryScan, on the grounds that the other common use of append paths is
> UNION ALL across subqueries. We might later want to add parameterization
> support to other path types, but this seemed like enough for the moment.

OK.

> BTW, after writing the code for it I decided to remove creation of
> parameterized MergeAppendPaths from allpaths.c, though there is still some
> support for them elsewhere. On reflection it seemed to me that the code
> was willing to create far too many of these, much more than their
> potential usefulness could justify (remember that parameterized paths must
> be on the inside of a nestloop, so their sort ordering is typically of
> marginal use). We can put that back if we can think of a more restrictive
> heuristic for when to create them.

I guess the case in which this would matter is if you wrote something
like A LJ (B IJ C) where B and/or C has child tables and the best
method of joining them to each other is a marge join. That doesn't
seem all that likely; normally a hash join or nested loop will be
better. On the flip side I can't see that generating a bunch of extra
paths is going to hurt you much there either; they will fall away
pretty quickly if they aren't useful for merging. Now, if you have
something like A IJ B or A LJ B, where B is partitioned, it's clearly
a waste of time to generate parameterized paths with pathkeys.

> The core of the patch is in the new functions get_baserel_parampathinfo
> and get_joinrel_parampathinfo, which look up or construct ParamPathInfos,
> and join_clause_is_parameterizable_for and
> join_clause_is_parameterizable_within, which control
> movement of parameterizable join clauses. (I'm not that thrilled with the
> names of the latter two functions, anybody got a better idea?) The rest
> of it is pretty much boilerplate changes and replacing ad-hoc logic with
> uses of this stuff.

Parameterizable is such a mouthful. I wish we had a better word.

> I have a couple of other ideas in mind in the way of mop-up, but they are
> not in this patch to keep it from bloating even more. First, I'm thinking
> we should get rid of RelOptInfo.baserestrictcost, thus forcing all scan
> cost estimators to invoke cost_qual_eval explicitly. That field has been
> vestigial from a planning-speed standpoint for a long time, ever since we
> started caching eval costs in RestrictInfos. The most commonly used cost
> estimators don't use it anymore as of this patch, and it'd likely be best
> to have a uniform coding pattern there. Second, I've gotten dissatisfied
> with the terminology "required_outer" that was used in the original param
> plans patch. I'm considering a global search and replace with
> "param_relids" or some variant of that.

Personally, I find required_outer more clear. YMMV.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: Parameterized-path cost comparisons need some work at 2012-04-17 16:14:41 from Tom Lane

Responses

Re: Parameterized-path cost comparisons need some work at 2012-04-17 19:01:29 from Alvaro Herrera
Re: Parameterized-path cost comparisons need some work at 2012-04-17 19:05:52 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2012-04-17 18:47:55	Re: libpq URI and regression testing
Previous Message	Josh Berkus	2012-04-17 18:41:04	Re: Gsoc2012 idea, tablesample