From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
---|---|
To: | Tomas Vondra <tv(at)fuzzy(dot)cz>, <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: bad estimation together with large work_mem generates terrible slow hash joins |
Date: | 2014-09-10 18:25:32 |
Message-ID: | 5410979C.6000509@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 09/10/2014 01:49 AM, Tomas Vondra wrote:
> On 9.9.2014 16:09, Robert Haas wrote:
>> On Mon, Sep 8, 2014 at 5:53 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
>>> So I only posted the separate patch for those who want to do a review,
>>> and then a "complete patch" with both parts combined. But it sure may be
>>> a bit confusing.
>>
>> Let's do this: post each new version of the patches only on this
>> thread, as a series of patches that can be applied one after another.
>
> OK, attached. Apply in this order
>
> 1) dense-alloc-v5.patch
> 2) hashjoin-nbuckets-v12.patch
The dense-alloc-v5.patch looks good to me. I have committed that with
minor cleanup (more comments below). I have not looked at the second patch.
> I also did a few 'minor' changes to the dense allocation patch, most
> notably:
>
> * renamed HashChunk/HashChunkData to MemoryChunk/MemoryChunkData
> The original naming seemed a bit awkward.
That's too easy to confuse with regular MemoryContext and AllocChunk
stuff. I renamed it to HashMemoryChunk.
> * the chunks size is 32kB (instead of 16kB), and we're using 1/4
> threshold for 'oversized' items
>
> We need the threshold to be >=8kB, to trigger the special case
> within AllocSet. The 1/4 rule is consistent with ALLOC_CHUNK_FRACTION.
Should we care about the fact that if there are only a few tuples, we
will nevertheless waste 32kB of memory for the chunk? I guess not, but I
thought I'd mention it. The smallest allowed value for work_mem is 64kB.
> I also considered Heikki's suggestion that while rebatching, we can
> reuse chunks with a single large tuple, instead of copying it
> needlessly. My attempt however made the code much uglier (I almost used
> a GOTO, but my hands rejected to type it ...). I'll look into that.
I tried constructing a test case where the rehashing time would be
significant enough for this to matter, but I couldn't find one. Even if
the original estimate on the number of batches is way too small, we ramp
up quickly to a reasonable size and the rehashing time is completely
insignificant compared to all the other work. So I think we can drop
that idea.
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2014-09-10 18:31:14 | Re: bad estimation together with large work_mem generates terrible slow hash joins |
Previous Message | Peter Geoghegan | 2014-09-10 17:36:30 | Re: B-Tree support function number 3 (strxfrm() optimization) |