From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Parameterized-path cost comparisons need some work |
Date: | 2012-04-17 18:46:23 |
Message-ID: | CA+TgmoZfOxNrnMJp8NgL08Y78-fh7TMCxDtCxDpoF6VqLFAXpg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Apr 17, 2012 at 12:14 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I've been hacking away on a patch to do this, and attached is something
> that I think is pretty close to committable. It needs another going-over
> and some new regression test cases, but it seems to work, and it fixes a
> number of things besides the above-mentioned issue. In particular, this
> has a much more principled approach than HEAD does to the problem of where
> to place parameterizable join clauses in the plan tree; that can be seen
> in the one change in the existing regression tests, where we no longer
> generate a redundant upper-level copy of an OR join clause that the old
> code wasn't bright enough to get rid of.
>
> The patch is a bit large because I chose to revise the data representation.
> Instead of each Path having its own required_outer, rows, and
> param_clauses fields, now a parameterized Path has a pointer to a
> ParamPathInfo struct that it shares with other Paths for the same rel and
> the same parameterization. This guarantees that such paths will have the
> same estimated rowcount, because we only compute that once per
> parameterization, which should save some work as well as making the world
> safe for add_path_precheck.
Seems reasonable.
> The only place where this approach proved a bit tricky was in handling
> AppendPaths and MergeAppendPaths, which didn't surprise me because that
> was a rough spot for the old way too (and indeed they aren't handled
> completely correctly in HEAD). A parameterized path is now *required*
> to enforce all clauses that the join clause movement rules assign to it;
> but Append and MergeAppend don't do qual checking, and I didn't feel like
> changing that. The method that I have settled on is to require all child
> paths of a parameterized append to have the exact same parameterization,
> IOW we push the qual checks down below the append. Now the interesting
> point about that is that we want to support Appends wherein some children
> are seqscans and some are indexscans (consider a partitioned table where
> the parent is a dummy empty table with no indexes). The "raw" situation
> there is that we'll have a plain seqscan path for the parent and then a
> collection of similarly-parameterized indexscan paths for the live
> partition children. To make it possible to convert that case into a
> parameterized append path, I added parameterization support to seqscans
> and then wrote "reparameterize_path", which changes a Path to increase
> its parameterization level (and thereby assign it more pushed-down join
> clauses to check at runtime). That allows us to reconfigure the seqscan
> to match the other children. I've also added such support to
> SubqueryScan, on the grounds that the other common use of append paths is
> UNION ALL across subqueries. We might later want to add parameterization
> support to other path types, but this seemed like enough for the moment.
OK.
> BTW, after writing the code for it I decided to remove creation of
> parameterized MergeAppendPaths from allpaths.c, though there is still some
> support for them elsewhere. On reflection it seemed to me that the code
> was willing to create far too many of these, much more than their
> potential usefulness could justify (remember that parameterized paths must
> be on the inside of a nestloop, so their sort ordering is typically of
> marginal use). We can put that back if we can think of a more restrictive
> heuristic for when to create them.
I guess the case in which this would matter is if you wrote something
like A LJ (B IJ C) where B and/or C has child tables and the best
method of joining them to each other is a marge join. That doesn't
seem all that likely; normally a hash join or nested loop will be
better. On the flip side I can't see that generating a bunch of extra
paths is going to hurt you much there either; they will fall away
pretty quickly if they aren't useful for merging. Now, if you have
something like A IJ B or A LJ B, where B is partitioned, it's clearly
a waste of time to generate parameterized paths with pathkeys.
> The core of the patch is in the new functions get_baserel_parampathinfo
> and get_joinrel_parampathinfo, which look up or construct ParamPathInfos,
> and join_clause_is_parameterizable_for and
> join_clause_is_parameterizable_within, which control
> movement of parameterizable join clauses. (I'm not that thrilled with the
> names of the latter two functions, anybody got a better idea?) The rest
> of it is pretty much boilerplate changes and replacing ad-hoc logic with
> uses of this stuff.
Parameterizable is such a mouthful. I wish we had a better word.
> I have a couple of other ideas in mind in the way of mop-up, but they are
> not in this patch to keep it from bloating even more. First, I'm thinking
> we should get rid of RelOptInfo.baserestrictcost, thus forcing all scan
> cost estimators to invoke cost_qual_eval explicitly. That field has been
> vestigial from a planning-speed standpoint for a long time, ever since we
> started caching eval costs in RestrictInfos. The most commonly used cost
> estimators don't use it anymore as of this patch, and it'd likely be best
> to have a uniform coding pattern there. Second, I've gotten dissatisfied
> with the terminology "required_outer" that was used in the original param
> plans patch. I'm considering a global search and replace with
> "param_relids" or some variant of that.
Personally, I find required_outer more clear. YMMV.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2012-04-17 18:47:55 | Re: libpq URI and regression testing |
Previous Message | Peter Eisentraut | 2012-04-17 18:41:04 | Re: libpq URI and regression testing |