Quick Links

Re: WIP: [[Parallel] Shared] Hash

From:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc:	Peter Geoghegan <pg(at)heroku(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP: [[Parallel] Shared] Hash
Date:	2017-01-31 13:10:58
Message-ID:	CAFjFpRd4j1Ah0c6VQQGmf4qgVDSP90KNypE0B83bof-sg+Y6VA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

>
> 0003-hj-refactor-memory-accounting-v4.patch:
>
> Modify the existing hash join code to work in terms of chunks when
> estimating and later tracking memory usage. This is probably more
> accurate than the current tuple-based approach, because it tries to
> take into account the space used by chunk headers and the wasted space
> in chunks. In practice the difference is probably small, but it's
> arguably more accurate; I did this because I need chunk-based
> accounting the later patches. Also, make HASH_CHUNK_SIZE the actual
> size of allocated chunks (ie the header information is included in
> that size so we allocate exactly 32KB, not 32KB + a bit, for the
> benefit of the dsa allocator which otherwise finishes up allocating
> 36KB).
>
I looked at this patch. I agree that it accounts the memory usage more
accurately. Here are few comments.

spaceUsed is defined with comment
Size spaceUsed; /* memory space currently used by tuples */

In ExecHashTableCreate(), although the space is allocated for buckets, no
tuples are yet inserted, so no space is used by the tuples, so going strictly
by the comment, spaceUsed should be 0 in that function. But I think the patch
is accounting the spaceUsed more accurately. Without this patch, the actual
allocation might cross spaceAllowed without being noticed. With this patch
that's not possible. Probably we should change the comment to say memory space
currently allocated. However, ExecHashIncreaseNumBatches() may change the
number of buckets; the patch does not seem to account for spaceUsed changes
because of that.

Without this patch ExecHashTableInsert() used to account for the space used by
a single tuple inserted. The patch moves this calculation in dense_alloc() and
accounts for out-of-bound allocation for larger tuples. That's good.

The change in ExecChooseHashTableSize() too looks fine.

In ExecHashTableReset(), do we want to update spacePeak while setting
spaceUsed.

While this patch tracks space usage more accurately, I am afraid we might be
overdoing it; a reason why we don't track space usage accurately now. But I
think I will leave it to be judged by someone who is more familiar with the
code and possibly has historical perspective.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Re: WIP: [[Parallel] Shared] Hash at 2017-01-28 01:03:47 from Thomas Munro

Responses

Re: WIP: [[Parallel] Shared] Hash at 2017-01-31 23:25:18 from Thomas Munro

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Konstantin Knizhnik	2017-01-31 13:30:52	Re: Deadlock in XLogInsert at AIX
Previous Message	Ashutosh Bapat	2017-01-31 13:04:37	Re: An issue in remote query optimization