Re: a few crazy ideas about hash joins

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)enterprisedb(dot)com>, "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: a few crazy ideas about hash joins
Date: 2009-04-07 13:55:28
Message-ID: 200904071355.n37DtS617785@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Are there any TODOs here?

---------------------------------------------------------------------------

Robert Haas wrote:
> On Fri, Apr 3, 2009 at 5:10 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> >> On Fri, Apr 3, 2009 at 4:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >>> Correct, but you've got the details all wrong. ?The real problem is that
> >>> the planner might discard a join path hash(A,B) at level 2 because it
> >>> loses compared to, say, merge(A,B). ?But when we get to level three,
> >>> perhaps hash(hash(A,B),C) would've been the best plan due to synergy
> >>> of the two hashes. ?We'll never find that out unless we keep the
> >>> "inferior" hash path around. ?We can certainly do that; the question
> >>> is what's it going to cost us to allow more paths to survive to the
> >>> next join level. ?(And I'm afraid the answer may be "plenty"; it's a
> >>> combinatorial explosion we're looking at here.)
> >
> >> That would be crazy. ?I think doing it the way I suggested is correct,
> >> just not guaranteed to catch every case. ?The reality is that even if
> >> we took Greg Stark's suggestion of detecting this situation only in
> >> the executor, we'd still get some benefit out of this. ?If we take my
> >> intermediate approach, we'll catch more cases where this is a win.
> >> What you're suggesting here would catch every conceivable case, but at
> >> the expense of what I'm sure would be an unacceptable increase in
> >> planning time for very limit benefit.
> >
> > Maybe, maybe not. ?I've seen plenty of plans that have several
> > mergejoins stacked up on top of each other with no intervening sorts.
> > There is 0 chance that the planner would have produced that if it
> > thought that it had to re-sort at each level; something else would have
> > looked cheaper. ?I think that your proposals will end up getting very
> > little of the possible benefit, because the planner will fail to choose
> > plan trees in which the optimization can be exploited.
>
> Well, I'm all ears if you have suggestions for improvement. For
> sorts, we use PathKeys to represent the ordering of each path and keep
> the paths for each set of pathkeys. By analogy, we could maintain a
> list of PathHash structures for each path representing the tables that
> had already been hashed. add_path() would then have to consider both
> the PathHash structures and the PathKey structures before concluding
> that a path was definitely worse than some path previously found. At
> each level of the join tree, we'd need to truncate PathHash structures
> that provably have no further use (e.g. on a base table that does not
> appear again above the level of the join already planned) to avoid
> keeping around paths that appeared to be better only because we didn't
> know that the paths they have hashed are worthless in practice. Maybe
> that wouldn't even be that expensive, actually, because there will be
> lots of cases where you know the relevant table doesn't appear
> elsewhere in the query and not save any extra paths. But I think we'd
> have to write the code and benchmark it to really know.
>
> I guess the reason I'm not too worked up about this is because my
> experience is that the planner nearly always prefers hash joins on
> small tables, even when an index is present - the queries I'm worried
> about optimizing don't need any additional encouragement to use hash
> joins; they're doing it already. But certainly it doesn't hurt to see
> how many cases we can pick up.
>
> ...Robert
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-04-07 13:59:13 Re: More message encoding woes
Previous Message Magnus Hagander 2009-04-07 13:44:58 Re: Path separator