Re: HashJoin w/option to unique-ify inner rel

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: HashJoin w/option to unique-ify inner rel
Date: 2009-04-25 02:37:25
Message-ID: 603c8f070904241937q3183f6ddx65d8e47f526ead1d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 24, 2009 at 8:09 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> So it appears to me that instead of taking an average-case correction
> as is done in this patch and the old coding, we have to explicitly model
> the matched-tuple and unmatched-tuple cases separately.  For hashjoins,
> what I think we should do is estimate the bucket scanning cost for
> unmatched tuples assuming a perfectly uniform bucket distribution,
> rather than the worst-case bucket size that's used in the current code
> (and is a reasonable estimate for matched tuples).  This is to reflect
> the fact that an unmatched outer tuple might or might not hit a
> populated bucket, and which bucket it hits is probably uncorrelated with
> the distribution of the inner tuples.

I've been investigating this some more as well and have come to the
conclusion that the following code (and estimate_hash_bucketsize) are
pretty much wrong.

>        /*
>         * The number of tuple comparisons needed is the number of outer tuples
> !        * times the typical number of tuples in a hash bucket, which is the inner
> !        * relation size times its bucketsize fraction.  At each one, we need to
> !        * evaluate the hashjoin quals.  But actually, charging the full qual eval
> !        * cost at each tuple is pessimistic, since we don't evaluate the quals
> !        * unless the hash values match exactly.  For lack of a better idea, halve
> !        * the cost estimate to allow for that.
>         */
>        startup_cost += hash_qual_cost.startup;
>        run_cost += hash_qual_cost.per_tuple *
> !               outer_path_rows * clamp_row_est(inner_path_rows * innerbucketsize) * 0.5;

As far as I can tell, the focus on trying to estimate the number of
tuples per bucket is entirely misguided. Supposing the relation is
mostly unique so that the values don't cluster too much, the right
answer is (of course) NTUP_PER_BUCKET. estimate_hash_bucketsize()
backs into this number, but it places far too much importance on the
value. If you increase NTUP_PER_BUCKET to, say, 50, the planner
concludes that hash joins are really, really expensive, and avoids
them like the plague, but in reality they're only a little slower.
Why? Because the extra tuples that get thrown into the bucket
generally don't have the same hash value (or if they did, they would
have been in the bucket either way...) and get rejected with a simple
integer comparison, which is much cheaper than
hash_qual_cost.per_tuple.

It seems to me that what we should really be attempting to estimate
here is the likely number of times we're going to have to evaluate the
hash quals per tuple. Your idea of estimated the match and unmatched
cases separately is a lot better than anything that I had come up
with. In the unmatched case, the cost of a hash probe is nearly zero,
regardless of whether the hash bucket is empty or not (because the
probability of a real hash collision is quite low, and nothing else
requires evaluating the hash quals). In the matched case, we expect
to probe about inner_rows/inner_ndistinct tuples - unless it's a SEMI
or ANTI join, in which case we expect to probe exactly one row.

As a side note, the code in estimate_hash_bucketsize that scales up
the estimate of tuples per bucket by the ratio of the frequency of the
most common value looks pretty sketchy as well. Consider a relation
which has 10000 values all distinct. The frequency of each value is
0.0001. Now we add a duplicate copy of one value, which makes it an
MCV with almost twice the frequency of any other value in the table,
so estimate_hash_bucketsize() essentially *doubles* its estimate of
how many items are in a bucket. That doesn't make a lot of sense. At
the very least, we ought not to do this scaling when the outer rel is
more or less distinct on the hash key; even if the inner rel has a
fair number of duplicates, things will still be pretty fast.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-04-25 02:49:23 Re: HashJoin w/option to unique-ify inner rel
Previous Message Tom Lane 2009-04-25 00:09:10 Re: HashJoin w/option to unique-ify inner rel