From: | Tomas Vondra <tv(at)fuzzy(dot)cz> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: bad estimation together with large work_mem generates terrible slow hash joins |
Date: | 2014-09-11 21:57:33 |
Message-ID: | 54121ACD.2020105@fuzzy.cz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 11.9.2014 16:33, Tomas Vondra wrote:
> On 11 Září 2014, 15:31, Robert Haas wrote:
>> On Wed, Sep 10, 2014 at 5:09 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
>>> OK. So here's v13 of the patch, reflecting this change.
>>
>> [...] It does three things:
>>
>> (1) It changes NTUP_PER_BUCKET to 1. Although this increases memory
>> utilization, it also improves hash table performance, and the
>> additional memory we use is less than what we saved as a result of
>> commit 45f6240a8fa9d35548eb2ef23dba2c11540aa02a.
>>
>> (2) It changes ExecChooseHashTableSize() to include the effect of the
>> memory consumed by the bucket array. If we care about properly
>> respecting work_mem, this is clearly right for any NTUP_PER_BUCKET
>> setting, but more important for a small setting like 1 than for the
>> current value of 10. I'm not sure the logic in this function is as
>> clean and simple as it can be just yet.
>> (3) It allows the number of batches to increase on the fly while the
>> hash join is in process. This case arises when we initially estimate
>> that we only need a small hash table, and then it turns out that there
>> are more tuples than we expect. Without this code, the hash table's
>> load factor gets too high and things start to suck.
>>
>> I recommend separating this patch in two patches, the first covering
>> items (1) and (2) and the second covering item (3), which really seems
>> like a separate improvement.
>
> That probably makes sense.
Attached is the patch split as suggested:
(a) hashjoin-nbuckets-v14a-size.patch
* NTUP_PER_BUCKET=1
* counting buckets towards work_mem
* changes in ExecChooseHashTableSize (due to the other changes)
(b) hashjoin-nbuckets-v14a-resize.patch
* rest of the patch, that is ...
* tracking optimal number of buckets
* dynamic resize of the hash table
* adding info the the EXPLAIN ANALYZE output
regards
Tomas
Attachment | Content-Type | Size |
---|---|---|
hashjoin-nbuckets-v14b-resize.patch | text/x-diff | 10.7 KB |
hashjoin-nbuckets-v14a-size.patch | text/x-diff | 4.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2014-09-11 22:03:11 | Re: Commitfest status |
Previous Message | Peter Geoghegan | 2014-09-11 21:46:43 | Re: B-Tree support function number 3 (strxfrm() optimization) |