From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Bryce Cutt <pandasuit(at)gmail(dot)com>, Joshua Tolley <eggyknap(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets |
Date: | 2009-03-21 00:35:53 |
Message-ID: | 603c8f070903201735x351271c4y4ce298b422367a25@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Mar 20, 2009 at 8:14 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Bryce Cutt <pandasuit(at)gmail(dot)com> writes:
>> Here is the new patch.
>
> Applied with revisions. I undid some of the "optimizations" that
> cluttered the code in order to save a cycle or two per tuple --- as per
> previous discussion, that's not what the performance questions were
> about. Also, I did not like the terminology "in-memory"/"IM"; it seemed
> confusing since the main hash table is in-memory too. I revised the
> code to consistently refer to the additional hash table as a "skew"
> hashtable and the optimization in general as skew optimization. Hope
> that seems reasonable to you --- we could search-and-replace it to
> something else if you'd prefer.
>
> For the moment, I didn't really do anything about teaching the planner
> to account for this optimization in its cost estimates. The initial
> estimate of the number of MCVs that will be specially treated seems to
> me to be too high (it's only accurate if the inner relation is unique),
> but getting a more accurate estimate seems pretty hard, and it's not
> clear it's worth the trouble. Without that, though, you can't tell
> what fraction of outer tuples will get the short-circuit treatment.
If the inner relation isn't fairly close to unique you shouldn't be
using this optimization in the first place.
...Robert
From | Date | Subject | |
---|---|---|---|
Next Message | Bryce Cutt | 2009-03-21 00:45:14 | Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets |
Previous Message | Robert Haas | 2009-03-21 00:34:17 | mbox-to-html script with stable identifiers |