From: | Tomas Vondra <tv(at)fuzzy(dot)cz> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: bad estimation together with large work_mem generates terrible slow hash joins |
Date: | 2014-09-09 22:49:37 |
Message-ID: | 540F8401.6040808@fuzzy.cz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 9.9.2014 16:09, Robert Haas wrote:
> On Mon, Sep 8, 2014 at 5:53 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
>> So I only posted the separate patch for those who want to do a review,
>> and then a "complete patch" with both parts combined. But it sure may be
>> a bit confusing.
>
> Let's do this: post each new version of the patches only on this
> thread, as a series of patches that can be applied one after another.
OK, attached. Apply in this order
1) dense-alloc-v5.patch
2) hashjoin-nbuckets-v12.patch
>>> In ExecChooseHashTableSize(), if buckets_bytes is independent of
>>> nbatch, could we just compute it before working out dbatch, and just
>>> deduct it from hash_table_bytes? That seems like it'd do the same
>>> thing for less work.
>>
>> Well, we can do that. But I really don't think that's going to make
>> measurable difference. And I think the code is more clear this way.
>
> Really? It seems to me that you're doing more or less the same
> calculation that's already been done a second time. It took me 20
> minutes of staring to figure out what it was really doing. Now
> admittedly I was a little low on caffeine, but...
I reworked this part a bit, to make it easier to understand. Mostly
following your recommendations.
>
>>> Either way, what happens if buckets_bytes is already bigger than
>>> hash_table_bytes?
>>
>> Hm, I don't see how that could happen.
>
> How about an Assert() and a comment, then?
Done. According to the comment, we should never really get over 25%,
except maybe for strange work_mem values, because we're keeping nbuckets
as 2^N. But even then we should not reach 50%, so I added this assert:
Assert(buckets_bytes <= hash_table_bytes/2);
And then we subtract the buckets_bytes, and continue with the loop.
>
>>> A few more stylistic issues that I see:
>>>
>>> + if ((hashtable->nbatch == 1) &&
>>> + if (hashtable->nbuckets_optimal <= (INT_MAX/2))
>>> + if (size > (HASH_CHUNK_SIZE/8))
>>>
>>> While I'm all in favor of using parentheses generously where it's
>>> needed to add clarity, these and similar usages seem over the top to
>>> me.
>>
>> It seemed more readable for me at the time of coding it, and it still
>> seems better this way, but ...
>>
>> https://www.youtube.com/watch?v=CxK_nA2iVXw
>
> Heh. Well, at any rate, I think PostgreSQL style is to not use
> parentheses in that situation.
FWIW, I added HASH_CHUNK_THRESHOLD macro, specifying the threshold. I
also simplified the conditions a bit, and removed some of the parentheses.
>
>>> +char * chunk_alloc(HashJoinTable hashtable, int size) {
>>>
>>> Eh... no.
>>>
>>> + /* XXX maybe we should use MAXALIGN(size) here ... */
>>>
>>> +1.
>>
>> I'm not sure what's the 'no' pointing at? Maybe that the parenthesis
>> should be on the next line? And is the +1 about doing MAXALING?
>
> The "no" is about the fact that you have the return type, the function
> name, and the opening brace on one line instead of three lines as is
> project style. The +1 is for applying MAXALIGN to the size.
OK, fixed.
I also did a few 'minor' changes to the dense allocation patch, most
notably:
* renamed HashChunk/HashChunkData to MemoryChunk/MemoryChunkData
The original naming seemed a bit awkward.
* renamed the function to 'dense_alloc' (instead of 'chunk_alloc')
Seems like a better fit to what the function does.
* the chunks size is 32kB (instead of 16kB), and we're using 1/4
threshold for 'oversized' items
We need the threshold to be >=8kB, to trigger the special case
within AllocSet. The 1/4 rule is consistent with ALLOC_CHUNK_FRACTION.
I also considered Heikki's suggestion that while rebatching, we can
reuse chunks with a single large tuple, instead of copying it
needlessly. My attempt however made the code much uglier (I almost used
a GOTO, but my hands rejected to type it ...). I'll look into that.
Tomas
Attachment | Content-Type | Size |
---|---|---|
dense-alloc-v5.patch | text/x-diff | 8.1 KB |
hashjoin-nbuckets-v12.patch | text/x-diff | 14.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2014-09-09 23:21:01 | Re: FD_SETSIZE on Linux? |
Previous Message | Thom Brown | 2014-09-09 22:46:59 | FD_SETSIZE on Linux? |