Quick Links

Re: WIP: [[Parallel] Shared] Hash

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc:	Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP: [[Parallel] Shared] Hash
Date:	2017-01-13 01:36:19
Message-ID:	CAM3SWZTVLWdYnERiPaSVN+UX3dxkGoFK+Q76RG0rQvX=HBpyyQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Jan 11, 2017 at 7:37 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> Hmm. Yes, that is an entirely bogus use of isInterXact. I am
> thinking about how to fix that with refcounts.

Cool. As I said, the way I'd introduce refcounts would not be very
different from what I've already done -- there'd still be a strong
adherence to the use of resource managers to clean-up, with that
including exactly one particular backend doing the extra step of
deletion. The refcount only changes which backend does that extra step
in corner cases, which is conceptually a very minor change.

>> I don't think it's right for buffile.c to know anything about file
>> paths directly -- I'd say that that's a modularity violation.
>> PathNameOpenFile() is called by very few callers at the moment, all of
>> them very low level (e.g. md.c), but you're using it within buffile.c
>> to open a path to the file that you obtain from shared memory
>
> Hmm. I'm not seeing the modularity violation. buffile.c uses
> interfaces already exposed by fd.c to do this: OpenTemporaryFile,
> then FilePathName to find the path, then PathNameOpenFile to open from
> another process. I see that your approach instead has client code
> provide more meta data so that things can be discovered, which may
> well be a much better idea.

Indeed, my point was that the metadata thing would IMV be better.
buffile.c shouldn't need to know about file paths, etc. Instead,
caller should pass BufFileImport()/BufFileUnify() simple private state
sufficient for routine to discover all details itself, based on a
deterministic scheme. In my tuplesort patch, that piece of state is:

/*
+ * BufFileOp is an identifier for a particular parallel operation involving
+ * temporary files. Parallel temp file operations must be discoverable across
+ * processes based on these details.
+ *
+ * These fields should be set by BufFileGetIdent() within leader process.
+ * Identifier BufFileOp makes temp files from workers discoverable within
+ * leader.
+ */
+typedef struct BufFileOp
+{
+ /*
+ * leaderPid is leader process PID.
+ *
+ * tempFileIdent is an identifier for a particular temp file (or parallel
+ * temp file op) for the leader. Needed to distinguish multiple parallel
+ * temp file operations within a given leader process.
+ */
+ int leaderPid;
+ long tempFileIdent;
+} BufFileOp;
+

> Right, that is a problem. A refcount mode could fix that; virtual
> file descriptors would be closed in every backend using the current
> resource owner, and the files would be deleted when the last one turns
> out the lights.

Yeah. That's basically what the BufFile unification process can
provide you with (or will, once I get around to implementing the
refcount thing, which shouldn't be too hard). As already noted, I'll
also want to make it defer creation of a leader-owned segment, unless
and until that proves necessary, which it never will for hash join.

Perhaps I should make superficial changes to unification in my patch
to suit your work, like rename the field BufFileOp.leaderPid to
BufFileOp.ownerPid, without actually changing any behaviors, except as
noted in the last paragraph. Since you only require that backends be
able to open up some other backend's temp file themselves for a short
while, that gives you everything you need. You'll be doing unification
in backends, and not just within the leader as in the tuplesort patch,
I believe, but that's just fine. All that matters is that you present
all data at once to a consuming backend via unification (since you
treat temp file contents as immutable, this will be true for hash
join, just as it is for tuplesort).

There is a good argument against my making such a tweak, however,
which is that maybe it's clearer to DBAs what's going on if temp file
names have the leader PID in them for all operations. So, maybe
BufFileOp.leaderPid isn't renamed to BufFileOp.ownerPid by me;
instead, you always make it the leader pid, while at the same time
having the leader dole out BufFileOp.tempFileIdent identifiers to each
worker as needed (see how I generate BufFileOps for an idea of what I
mean if it's not immediately clear). That's also an easy change, or at
least will be once the refcount thing is added.

--
Peter Geoghegan

In response to

Re: WIP: [[Parallel] Shared] Hash at 2017-01-12 03:37:28 from Thomas Munro

Responses

Re: WIP: [[Parallel] Shared] Hash at 2017-01-28 01:03:47 from Thomas Munro

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jim Nasby	2017-01-13 01:46:39	Re: plpgsql - additional extra checks
Previous Message	Michael Paquier	2017-01-13 00:43:26	Re: many copies of atooid() and oid_cmp()