Quick Links

Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets

From:	Joshua Tolley <eggyknap(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Bryce Cutt <pandasuit(at)gmail(dot)com>, "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Date:	2008-12-23 18:28:19
Message-ID:	20081223182818.GA5867@uber
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Dec 23, 2008 at 10:14:29AM -0500, Robert Haas wrote:
> > It's equivalent to our assumption that distributions of values in
> > columns in the same table are independent. Making that assumption in
> > this case would probably result in occasional dramatic speed
> > improvements similar to the ones we've seen in less complex joins,
> > offset by just-as-occasional dramatic slowdowns of similar magnitude. In
> > other words, it will increase the variance of our results.
>
> Under what circumstances do you think that it would produce a dramatic
> slowdown? I'm confused. I thought the penalty for picking a bad set
> of values for the in-memory hash table was pretty small.
>
> ...Robert

I take that back :) I agree with what others have already said, that it
shouldn't cause dramatic slowdowns when we get it wrong.

- Josh

In response to

Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets at 2008-12-23 15:14:29 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jeff Davis	2008-12-23 18:34:41	Re: Lock conflict behavior?
Previous Message	Lawrence, Ramon	2008-12-23 18:12:22	Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets