Quick Links

Re: WIP: [[Parallel] Shared] Hash

From:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc:	Peter Geoghegan <pg(at)heroku(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP: [[Parallel] Shared] Hash
Date:	2017-02-13 22:19:13
Message-ID:	CAEepm=14b4vbjcpCqmzuEAfMxQJzsoH83DHEtzUHBVcGuXyQ=A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Feb 9, 2017 at 2:03 AM, Ashutosh Bapat
<ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>>
>> 0004-hj-refactor-batch-increases-v4.patch:
>>
>> Modify the existing hash join code to detect work_mem exhaustion at
>> the point where chunks are allocated, instead of checking after every
>> tuple insertion. This matches the logic used for estimating, and more
>> importantly allows for some parallelism in later patches.
>
> The patch has three changes
> 1. change dense_alloc() to accept respect_workmem argument and use it
> within the function.
> 2. Move call to ExecHashIncreaseNumBatches() into dense_alloc() from
> ExecHashTableInsert() to account for memory before inserting new tuple
> 3. Check growEnabled before calling ExecHashIncreaseNumBatches().

Thanks for the review!

> I think checking growEnabled within ExecHashIncreaseNumBatches() is
> more easy to maintain that checking at every caller. If someone is to
> add a caller tomorrow, s/he has to remember to add the check.

Hmm. Yeah. In the later 0010 patch ExecHashIncreaseNumBatches will
be used in a slightly different way -- not for making decisions or
performing the hash table shrink, but only for reallocating the batch
arrays. I will see if putting the growEnabled check back in there in
the 0004 patch and then refactoring in a later patch makes more sense
to someone reviewing the patches independently, for the next version.

> It might be better to add some comments in
> ExecHashRemoveNextSkewBucket() explaining why dense_alloc() should be
> called with respect_work_mem = false? ExecHashSkewTableInsert() does
> call ExecHashIncreaseNumBatches() after calling
> ExecHashRemoveNextSkewBucket() multiple times, so it looks like we do
> expect increase in space used and thus go beyond work_mem for a short
> while. Is there a way we can handle this case in dense_alloc()?

Right, that needs some explanation, which I'll add for the next
version. The explanation is that while 'shrinking' the hash table, we
may need to go over the work_mem limit by one chunk for a short time.
That is already true in master, but by moving the work_mem checks into
dense_alloc I ran into the problem that dense_alloc might decide to
shrink the hash table which needs to call dense alloc. Shrinking
works by spinning through all the chunks copying only the tuples we
want to keep into new chunks and freeing the old chunks as we go. We
will temporarily go one chunk over work_mem when we allocate the first
new chunk but before we've freed the first old one. We don't want
shrink operations to trigger recursive shrink operations, so we
disable respect for work_mem when calling it from
ExecHashIncreaseNumBatches. In the course of regular hash table
loading, we want to respect work_mem.

Looking at the v5 patch series I posted yesterday, I see that in fact
ExecHashIncreaseNumBatches calls dense_alloc with respect_work_mem =
true in the 0004 patch, and then I corrected that mistake in the 0008
patch; I'll move the correction back to the 0004 patch in the next
version.

To reach ExecHashIncreaseNumBatches, see the "ugly" query in
hj-test-queries.sql (posted with v5).

In ExecHashRemoveNextSkewBucket I preserved the existing behaviour of
not caring about work_mem when performing the rare operation of
copying a tuple from the skew bucket into a dense_alloc memory chunk
so it can be inserted into a regular (non-skew) bucket.

> Is it possible that increasing the number of batches changes the
> bucket number of the tuple being inserted? If so, should we
> recalculate the bucket and batch of the tuple being inserted?

No -- see the function documentation for ExecHashGetBucketAndBatch.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Re: WIP: [[Parallel] Shared] Hash at 2017-02-08 13:03:57 from Ashutosh Bapat

Responses

Re: WIP: [[Parallel] Shared] Hash at 2017-02-15 23:22:47 from Thomas Munro

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Simon Riggs	2017-02-13 22:19:20	Re: log_autovacuum_min_duration doesn't log VACUUMs
Previous Message	David Rowley	2017-02-13 22:03:43	Re: Small improvement to parallel query docs