Quick Links

Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bryce Cutt <pandasuit(at)gmail(dot)com>
Cc:	Joshua Tolley <eggyknap(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets
Date:	2009-03-21 00:14:52
Message-ID:	12249.1237594492@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Bryce Cutt <pandasuit(at)gmail(dot)com> writes:
> Here is the new patch.

Applied with revisions. I undid some of the "optimizations" that
cluttered the code in order to save a cycle or two per tuple --- as per
previous discussion, that's not what the performance questions were
about. Also, I did not like the terminology "in-memory"/"IM"; it seemed
confusing since the main hash table is in-memory too. I revised the
code to consistently refer to the additional hash table as a "skew"
hashtable and the optimization in general as skew optimization. Hope
that seems reasonable to you --- we could search-and-replace it to
something else if you'd prefer.

For the moment, I didn't really do anything about teaching the planner
to account for this optimization in its cost estimates. The initial
estimate of the number of MCVs that will be specially treated seems to
me to be too high (it's only accurate if the inner relation is unique),
but getting a more accurate estimate seems pretty hard, and it's not
clear it's worth the trouble. Without that, though, you can't tell
what fraction of outer tuples will get the short-circuit treatment.

regards, tom lane

In response to

Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets at 2009-03-02 23:47:34 from Bryce Cutt

Responses

Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets at 2009-03-21 00:35:53 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2009-03-21 00:34:17	mbox-to-html script with stable identifiers
Previous Message	Sergey Burladyan	2009-03-20 23:15:42	Re: gettext, plural form and translation