From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: POC: Sharing record typmods between backends |
Date: | 2017-09-04 06:14:39 |
Message-ID: | CAEepm=2Vs5iR4MO4LWe3Ap0jiQngoT3jrSAGPV3BiGd3i3n6ig@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks for the review and commits so far. Here's a rebased, debugged
and pgindented version of the remaining patches. I ran pgindent with
--list-of-typedefs="SharedRecordTableKey,SharedRecordTableEntry,SharedTypmodTableEntry,SharedRecordTypmodRegistry,Session"
to fix some weirdness around these new typenames.
While rebasing the 0002 patch (removal of tqueue.c's remapping logic),
I modified the interface of the newly added
ExecParallelCreateReaders() function from commit 51daa7bd because it
no longer has any reason to take a TupleDesc.
On Fri, Aug 25, 2017 at 1:46 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2017-08-21 11:02:52 +1200, Thomas Munro wrote:
>> 2. Andres didn't like what I did to DecrTupleDescRefCount, namely
>> allowing to run when there is no ResourceOwner. I now see that this
>> is probably an indication of a different problem; even if there were a
>> worker ResourceOwner as he suggested (or perhaps a session-scoped one,
>> which a worker would reset before being reused), it wouldn't be the
>> one that was active when the TupleDesc was created. I think I have
>> failed to understand the contracts here and will think/read about it
>> some more.
>
> Maybe I'm missing something, but isn't the issue here that using
> DecrTupleDescRefCount() simply is wrong, because we're not actually
> necessarily tracking the TupleDesc via the resowner mechanism?
Yeah. Thanks.
> If you look at the code, in the case it's a previously unknown tupledesc
> it's registered with:
>
> entDesc = CreateTupleDescCopy(tupDesc);
> ...
> /* mark it as a reference-counted tupdesc */
> entDesc->tdrefcount = 1;
> ...
> RecordCacheArray[newtypmod] = entDesc;
> ...
>
> Note that there's no PinTupleDesc(), IncrTupleDescRefCount() or
> ResourceOwnerRememberTupleDesc() managing the reference from the
> array. Nor was there one before.
>
> We have other code managing TupleDesc lifetimes similarly, and look at
> how they're freeing it:
> /* Delete tupdesc if we have it */
> if (typentry->tupDesc != NULL)
> {
> /*
> * Release our refcount, and free the tupdesc if none remain.
> * (Can't use DecrTupleDescRefCount because this reference is not
> * logged in current resource owner.)
> */
> Assert(typentry->tupDesc->tdrefcount > 0);
> if (--typentry->tupDesc->tdrefcount == 0)
> FreeTupleDesc(typentry->tupDesc);
> typentry->tupDesc = NULL;
> }
Right. I have changed shared_record_typmod_registry_worker_detach()
to be more like that, with an explanation.
> This also made me think about how we're managing the lookup from the
> shared array:
>
> /*
> * Our local array can now point directly to the TupleDesc
> * in shared memory.
> */
> RecordCacheArray[typmod] = tupdesc;
>
> Uhm. Isn't that highly highly problematic? E.g. tdrefcount manipulations
> which are done by all lookups (cf. lookup_rowtype_tupdesc()) would in
> that case manipulate shared memory in a concurrency unsafe manner.
No. See this change, in that and similar code paths:
- IncrTupleDescRefCount(tupDesc);
+ PinTupleDesc(tupDesc);
The difference between IncrTupleDescRefCount() and PinTupleDesc() is
that the latter recognises non-refcounted tuple descriptors
(tdrefcount == -1) and does nothing. Shared tuple descriptors are not
reference counted (see TupleDescCopy() which initialises
dst->tdrefcount to -1). It was for foolish symmetry that I was trying
to use ReleaseTupleDesc() in shared_record_typmod_registry_detach()
before, since it also knows about non-refcounted tuple descriptors,
but that's not appropriate: it calls DecrTupleDescRefCount() which
assumes that we're using resource owners. We're not.
To summarise the object lifetime management situation created by this
patch: shared TupleDesc objects accumulate in per-session DSM memory
until eventually the session ends and the DSM memory goes away. A bit
like CacheMemoryContext: there is no retail cleanup of shared
TupleDesc objects. BUT: the DSM detach callback is used to clear out
backend-local pointers to that stuff (and any non-shared reference
counted TupleDesc objects that might be found), in anticipation of
being able to reuse a worker process one day (which will involve
attaching to a new session, so we mustn't retain any traces of the
previous session in our local state). Maybe I'm trying to be a little
too clairvoyant there...
I improved the cleanup code: now it frees RecordCacheArray and
RecordCacheHash and reinstalls NULL pointers. Also it deals with
errors in GetSessionDsmHandle() better.
I renamed the members of Session to include a "shared_" prefix, which
seems a bit clearer.
I refactored it so that it never makes needless local copies of
TupleDesc objects (previously assign_record_type_typmod() would create
an extra local copy and cache that, which was wasteful). That
actually makes much of the discussion above moot: on detach, a worker
should now ONLY find shared non-refcounted TupleDesc objects in the
local caches, so the FreeTupleDesc() case is unreachable...
The leader on the other hand can finish up with a mixture of local and
shared TupleDesc objects in its cache, if it had some before it ran a
parallel query. Its detach hook doesn't try to free those so it
doesn't matter.
--
Thomas Munro
http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
shared-record-typmods-v10.patchset.tgz | application/x-gzip | 22.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2017-09-04 06:38:32 | Re: CLUSTER command progress monitor |
Previous Message | Michael Paquier | 2017-09-04 06:14:00 | Re: [PATCH] Assert that the correct locks are held when calling PageGetLSN() |