From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Stephen Frost <sfrost(at)snowman(dot)net> |
Cc: | Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Hash Join cost estimates |
Date: | 2013-03-31 19:45:46 |
Message-ID: | 15119.1364759146@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Stephen Frost <sfrost(at)snowman(dot)net> writes:
> * Jeff Davis (pgsql(at)j-davis(dot)com) wrote:
>> In Stephen's case the table was only 41KB, so something still seems off.
>> Maybe we should model the likelihood of a collision based on the
>> cardinalities (assuming a reasonably good hash function)?
> It's not really 'hash collisions' that we're trying to be wary of, per
> se, it's the risk of duplicates.
I spent some time looking at this. I think the real issue is that the
code is trying to charge something close to cpu_operator_cost for each
time an outer tuple is compared to a hashtable entry. That was actually
reasonable back when the costing code was last touched --- the current
scheme of storing and comparing the hash codes before testing the
equality operator proper only dates to commit
849074f9ae422c64501bb1d53ef840de870bf65c. I punted on changing the cost
estimates at that time, and I think what this example is showing is that
that was a bad decision. Really, when we're traipsing down a bucket
list, skipping over bucket entries with the wrong hash code is just
about free, or at least it's a whole lot cheaper than applying ExecQual.
Perhaps what we should do is charge the hash_qual_cost only for some
small multiple of the number of tuples that we expect will *pass* the
hash quals, which is a number we have to compute anyway. The multiple
would represent the rate of hash-code collisions we expect.
I'd still be inclined to charge something per bucket entry, but it
should be really small, perhaps on the order of 0.01 times
cpu_operator_cost.
Or we could just drop that term entirely. It strikes me that the reason
to be worried about skewed distribution in the inner relation is not
really that it changes the run time much, but rather that it risks
blowing out work_mem if specific virtual buckets have too many members
(or at best, we're forced to increase the number of batches more than we
thought to stay under work_mem; which carries runtime costs of its own).
Maybe what we should be doing with the bucketsize numbers is estimating
peak memory consumption to gate whether we'll accept the plan at all,
rather than adding terms to the cost estimate.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Dimitri Fontaine | 2013-03-31 21:01:41 | Re: [PATCH] Exorcise "zero-dimensional" arrays |
Previous Message | Jeff Janes | 2013-03-31 18:27:06 | Re: Page replacement algorithm in buffer cache |