From: | Tomas Vondra <tv(at)fuzzy(dot)cz> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: tweaking NTUP_PER_BUCKET |
Date: | 2014-07-13 19:32:28 |
Message-ID: | 53C2DECC.2010408@fuzzy.cz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 11.7.2014 19:25, Tomas Vondra wrote:
> 2) walking through the tuples sequentially
> ------------------------------------------
>
> The other option is not to walk the tuples through buckets, but by
> walking throught the chunks - we know the tuples are stored as
> HashJoinTuple/MinimalTuple, so it shouldn't be that difficult. And by
> walking the tuples in the order they're stored, we may just rearrange
> the tuples in already allocated memory. And maybe it could be even
> faster than the current code, because of the sequential access to the
> memory (as opposed to the random access through buckets) and CPU caches.
> So I like this approach - it's simple, clean and shouldn't add any
> overhead (memory or CPU).
So, attached is a patch that implements this. It's not very complicated
and the resulting code is surprisingly readable (IMHO comprable to the
previous code).
Basics of the patch:
(1) adds HashChunkData structure - a buffer used for dense allocation
of tuples, 16kB by default
(2) adds 'chunks_batch' into HashJoinTableData, which is a linked list
of HashChunkData
(3) the allocation itself is done in chunk_alloc - in short it either
allocates the tuple in the current chunk, or adds a new one if
needed
(4) ExecHashTableInsert calls chunk_alloc (instead of the regular
MemoryContextAlloc)
(5) the main change is in ExecHashIncreaseNumBatches, which now can't
do pfree (does not work with dense allocation), so it reads the
tuples from chunks directly and simply rebuilds the buckets
from scratch (thanks to this we only need one more chunk of memory,
not a complete copy, and we don't need to duplicate buckets at all)
From the tests I've done so far this seems to work perfectly - it still
saves the memory overhead, and I've seen no difference in performance.
The current patch only implemnents this for tuples in the main hash
table, not for skew buckets. I plan to do that, but it will require
separate chunks for each skew bucket (so we can remove it without
messing with all of them). The chunks for skew buckets should be smaller
(I'm thinking about ~4kB), because there'll be more of them, but OTOH
those buckets are for frequent values so the chunks should not remain empty.
But let's see if the current patch misses something important first.
regards
Tomas
Attachment | Content-Type | Size |
---|---|---|
hashjoin-alloc-v3.patch | text/x-diff | 9.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Stefan Kaltenbrunner | 2014-07-13 20:32:27 | Re: SSL information view |
Previous Message | Andres Freund | 2014-07-13 19:20:23 | Re: better atomics - v0.5 |