Re: OUTER JOIN performance regression remains in 8.3beta4

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: OUTER JOIN performance regression remains in 8.3beta4
Date: 2008-01-09 17:13:40
Message-ID: 7704.1199898820@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> "Alvaro Herrera" <alvherre(at)commandprompt(dot)com> writes:
>> Would it be a good idea to keep removing redundant clauses and rethink
>> the preference for clauseful joins, going forward?

> I don't understand what's going on here. The planner is choosing one join
> order over another because one join has more join clauses than the
> other?

Not more join clauses, but any join clause at all. We will not explore
join paths that don't have any join clause, unless forced to by lack of
any other way to form the result.

> Even
> though some of those joins are entirely redundant and have no selectivity?

You're confusing whether we explore a path (ie, cost it out) with
whether we choose it. It's a necessary precondition, of course,
but we won't pick the path unless it looks cheapest.

Not exploring clauseless join paths is a heuristic that's needed to
avoid exponential growth of the search space in large join problems.
AFAIK every System-R-derived planner has done this.

As an example, consider
t1 join t2 on (...) join t3 on (...) ... join t8 on (...)
and for simplicity suppose that each ON condition relates the new
table to the immediately preceding table, and that we can't derive
any additional join conditions through transitivity. In this situation
there are going to be only seven ways to form a two-base-relation
joinrel, as long as we allow only clauseful joins. But there are
8*7/2 = 28 distinct ways to form a join if we consider all possible
join pairs whether they have a join clause or not. At the
three-base-relation level there will be 12 joinrels if we only consider
clauseful pairs, or 56 if we don't. It gets worse as you go up, and
most if not all of those additional joinrels represent entirely useless
variations on the theme of "let's stupidly compute a cartesian product
and then winnow it sometime later".

This is not to say that there is never a case where an early cartesian
product couldn't be a useful part of a plan, but rejecting them is a
darn good heuristic.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chris Browne 2008-01-09 17:20:58 Re: Named vs Unnamed Partitions
Previous Message Simon Riggs 2008-01-09 17:06:15 Re: Named vs Unnamed Partitions

Browse pgsql-patches by date

  From Date Subject
Next Message Alvaro Herrera 2008-01-09 17:22:32 Re: BUG #3860: xpath crashes backend when is querying xmlagg result
Previous Message Gregory Stark 2008-01-09 16:47:51 Re: OUTER JOIN performance regression remains in 8.3beta4