From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: WIP: [[Parallel] Shared] Hash |
Date: | 2017-03-22 10:17:19 |
Message-ID: | CAEepm=2ug5CZ7S9c6yqaDH0er9oLNH=eXa7WzkFHrBV_FDmTag@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
Here is a new version addressing feedback from Peter and Andres.
Please see below.
On Wed, Mar 22, 2017 at 3:18 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Tue, Mar 21, 2017 at 5:07 AM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>> buffile.c should stop pretending to care about anything other than
>>> temp files, IMV. 100% of all clients that want temporary files go
>>> through buffile.c. 100% of all clients that want non-temp files (files
>>> which are not marked FD_TEMPORARY) access fd.c directly, rather than
>>> going through buffile.c.
>>
>> I still need BufFile because I want buffering.
>>
>> There are 3 separate characteristics enabled by flags with 'temporary'
>> in their name. I think we should consider separating the concerns by
>> splitting and renaming them:
>>
>> 1. Segmented BufFile behaviour. I propose renaming BufFile's isTemp
>> member to isSegmented, because that is what it really does. I want
>> that feature independently without getting confused about lifetimes.
>> Tested with small MAX_PHYSICAL_FILESIZE as you suggested.
>
> I would have proposed to get rid of the isTemp field entirely. It is
> always true with current usage, any only #ifdef NOT_USED code presumes
> that it could be any other way. BufFile is all about temp files, which
> ISTM should be formalized. The whole point of BufFile is to segment
> fd.c temp file segments. Who would ever want to use BufFile without
> that capability anyway?
Yeah, it looks like you're probably right, but I guess others could
have uses for BufFile that we don't know about. It doesn't seem like
it hurts to leave the variable in existence.
>> 2. The temp_file_limit system. Currently this applies to fd.c files
>> opened with FD_TEMPORARY. You're right that we shouldn't be able to
>> escape that sanity check on disk space just because we want to manage
>> disk file ownership differently. I propose that we create a new flag
>> FD_TEMP_FILE_LIMIT that can be set independentlyisTemp of the flags
>> controlling disk file lifetime. When working with SharedBufFileSet,
>> the limit applies to each backend in respect of files it created,
>> while it has them open. This seems a lot simpler than any
>> shared-temp-file-limit type scheme and is vaguely similar to the way
>> work_mem applies in each backend for parallel query.
>
> I agree that that makes sense as a user-visible behavior of
> temp_file_limit. This user-visible behavior is what I actually
> implemented for parallel CREATE INDEX.
Ok, good.
>> 3. Delete-on-close/delete-at-end-of-xact. I don't want to use that
>> facility so I propose disconnecting it from the above. We c{ould
>> rename those fd.c-internal flags FD_TEMPORARY and FD_XACT_TEMPORARY to
>> FD_DELETE_AT_CLOSE and FD_DELETE_AT_EOXACT.
>
> This reliably unlink()s all files, albeit while relying on unlink()
> ENOENT as a condition that terminates deletion of one particular
> worker's BufFile's segments. However, because you effectively no
> longer use resowner.c, ISTM that there is still a resource leak in
> error paths. ResourceOwnerReleaseInternal() won't call FileClose() for
> temp-ish files (that are not quite temp files in the current sense) in
> the absence of no other place managing to do so, such as
> BufFileClose(). How can you be sure that you'll actually close() the
> FD itself (not vFD) within fd.c in the event of an error? Or Delete(),
> which does some LRU maintenance for backend's local VfdCache?
Yeah, I definitely need to use resowner.c. The only thing I want to
opt out of is automatic file deletion in that code path.
> If I follow the new code correctly, then it doesn't matter that you've
> unlink()'d to take care of the more obvious resource management chore.
> You can still have a reference leak like this, if I'm not mistaken,
> because you still have backend local state (local VfdCache) that is
> left totally decoupled with the new "shadow resource manager" for
> shared BufFiles.
You're right. The attached version fixes these problems. The
BufFiles created or opened in this new way now participate in both of
our leak-detection and clean-up schemes: the one in resowner.c
(because I'm now explicitly registering with it as I had failed to do
before) and the one in CleanupTempFiles (because FD_CLOSE_AT_EOXACT is
set, which I already had in the previous version for the creator, but
not the opener of such a file). I tested by commenting out my
explicit BufFileClose calls to check that resowner.c starts
complaining, and then by commenting out the resowner registration too
to check that CleanupTempFiles starts complaining.
>> As shown in 0008-hj-shared-buf-file-v8.patch. Thoughts?
>
> A less serious issue I've also noticed is that you add palloc() calls,
> implicitly using the current memory context, within buffile.c.
> BufFileOpenTagged() has some, for example. However, there is a note
> that we don't need to save the memory context when we open a BufFile
> because we always repalloc(). That is no longer the case here.
I don't see a problem here. BufFileOpenTagged() is similar to
BufFileCreateTemp() which calls makeBufFile() and thereore returns a
result that is allocated in the current memory context. This seems
like the usual deal.
Thanks for the review!
On Wed, Mar 22, 2017 at 1:07 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Thu, Feb 16, 2017 at 3:36 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> I think the synchronization protocol with the various phases needs to be
>> documented somewhere. Probably in nodeHashjoin.c's header.
>
> I will supply that shortly.
Added in the attached version.
--
Thomas Munro
http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
parallel-shared-hash-v9.tgz | application/x-gzip | 64.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Etsuro Fujita | 2017-03-22 10:20:52 | Re: Push down more UPDATEs/DELETEs in postgres_fdw |
Previous Message | Pavan Deolasee | 2017-03-22 10:14:02 | Re: Patch: Write Amplification Reduction Method (WARM) |