Re: Extending SMgrRelation lifetimes

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Extending SMgrRelation lifetimes
Date: 2023-11-29 12:41:57
Message-ID: 621a52fd-3cd8-4f5d-a561-d510b853bbaf@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I spent some more time digging into this, experimenting with different
approaches. Came up with pretty significant changes; see below:

On 18/09/2023 18:19, Robert Haas wrote:
> I think that if you believe 0001 to be correct you should go ahead and
> commit it sooner rather than later. If you're wrong and something
> weird starts happening we'll then have a chance to notice that before
> too much other stuff gets changed on top of this and confuses the
> matter.

+1

> On Wed, Aug 23, 2023 at 12:55 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>> Right, let's one find one good place. I think smgropen() would be best.
>
> I think it would be a good idea to give this comment a bit more oomph.
> In lieu of this:
>
> + * This does not attempt to actually open the underlying files. The returned
> + * object remains valid at least until AtEOXact_SMgr() is called, or until
> + * smgrdestroy() is called in non-transactional backends.
>
> I would leave the existing "This does not attempt to actually open the
> underlying files." comment as a separate comment, and add something
> like this as a new paragraph:
>
> In versions of PostgreSQL prior to 17, this function returned an
> object with no defined lifetime. Now, however, the object remains
> valid for the lifetime of the transaction, up to the point where
> AtEOXact_SMgr() is called, making it much easier for callers to know
> for how long they can hold on to a pointer to the returned object. If
> this function is called outside of a transaction, the object remains
> valid until smgrdestroy() or smgrdestroyall() is called. Background
> processes that use smgr but not transactions typically do this once
> per checkpoint cycle.

+1

> Apart from that, the main thing that is bothering me is that the
> justification for redefining smgrclose() to do what smgrrelease() now
> does isn't really spelled out anywhere. You mentioned some reasons and
> Heikki mentioned the benefit to extensions, but I feel like somebody
> should be able to understand the reasoning clearly from reading the
> commit message or the comments in the patch, rather than having to
> visit the mailing list discussion, and I'm not sure we're there yet. I
> feel like I understood why we were doing this and was convinced it was
> a good idea at some point, but now the reasoning has gone out of my
> head and I can't recover it. If somebody does smgropen() .. stuff ...
> smgrclose() as in heapam_relation_copy_data() or index_copy_data(),
> this change has the effect of making the SmgrRelation remain valid
> until eoxact whereas before it would have been destroyed instantly. Is
> that what we want? Presumably yes, or this patch wouldn't be shaped
> like this, but I don't know why we want that...

Fair. I tried to address that by adding an overview comment at top of
smgr.c, explaining how this stuff work. I hope that helps.

> Another thing that seems a bit puzzling is how this is intended to, or
> does, interact with the ownership mechanism. Intuitively, I feel like
> a Relation owns an SMgrRelation *because* the SMgrRelation has no
> defined lifetime. But I think that's not quite right. I guess the
> ownership mechanism doesn't guarantee anything about the lifetime of
> the object, just that the pointer in the Relation won't hang around
> any longer than the object to which it's pointing. So does that mean
> that we're free to redefine the object lifetime to be pretty much
> anything we want and that mechanism doesn't really care? Huh,
> interesting.

Yeah that owner mechanism is weird. It guarantees that the pointer to
the SMgrRelation is cleared when the SMgrRelation is destroyed. But it
also prevents the SMgrRelation from being destroyed at end of
transaction. That's how it is in 'master' too.

But with this patch, we don't normally call smgrdestroy() on an
SMgrRelation that came from the relation cache. We do call
smgrdestroyall() in the aux processes, but they don't have a relcache.
So the real effect of setting the owner now is to prevent the
SMgrRelation from being destroyed at end of transaction; the mechanism
of clearing the pointer is unused.

I found two exceptions to that, though, by adding some extra assertions
and running the regression tests:

1. The smgrdestroyall() in a single-user backend in RequestCheckpoint().
It destroys SMgrRelations belonging to relcache entries, and the owner
mechanism clears the pointers from the relcache. I think smgrcloseall(),
or doing nothing, would actually be more appropriate there.

2. A funny case with foreign tables: ANALYZE on a foreign table calls
visibilitymap_count(). A foreign table has no visibility map so it
returns 0, but before doing so it calls RelationGetSmgr on the foreign
table, which has 0/0/0 rellocator. That creates an SMgrRelation for
0/0/0, and sets the foreign table's relcache entry as its owner. If you
then call ANALYZE on another foreign table, it also calls
RelationGetSmgr with 0/0/0 rellocator, returning the same SMgrRelation
entry, and changes its owner to the new relcache entry. That doesn't
make much sense and it's pretty accidental that it works at all, so
attached is a patch to avoid calling visibilitymap_count() on foreign
tables.

I propose that we replace the single owner with a "pin counter". One
SMgrRelation can have more than one pin on it, and the guarantee is that
as long as the pin counter is non-zero, the SMgrRelation cannot be
destroyed and the pointer remains valid. We don't really need the
capability for more than one pin at the moment (the regression tests
pass with an extra assertion that pincount <= 1 after fixing the foreign
table issue), but it seems more straightforward than tracking an owner.

Here's another reason to do that: I noticed this at the end of
swap_relation_files():

> /*
> * Close both relcache entries' smgr links. We need this kluge because
> * both links will be invalidated during upcoming CommandCounterIncrement.
> * Whichever of the rels is the second to be cleared will have a dangling
> * reference to the other's smgr entry. Rather than trying to avoid this
> * by ordering operations just so, it's easiest to close the links first.
> * (Fortunately, since one of the entries is local in our transaction,
> * it's sufficient to clear out our own relcache this way; the problem
> * cannot arise for other backends when they see our update on the
> * non-transient relation.)
> *
> * Caution: the placement of this step interacts with the decision to
> * handle toast rels by recursion. When we are trying to rebuild pg_class
> * itself, the smgr close on pg_class must happen after all accesses in
> * this function.
> */
> RelationCloseSmgrByOid(r1);
> RelationCloseSmgrByOid(r2);

If RelationCloseSmgrByOid() merely closes the underlying file descriptor
but doesn't destroy the SMgrRelation object - as it does with these
patches - I think we reintroduce the dangling reference problem that the
comment mentions. But if we allow the same SMgrRelation to be pinned by
two different relcache entries, the problem goes away and we can remove
that kluge.

I think we're missing test coverage for that though. I commented out
those calls in 'master' and ran the regression tests, but got no
failures. I don't fully understand the problem anyway. Or does it not
exist anymore? Is there a moment where we have two relcache entries
pointing to the same SMgrRelation? I don't see it. In any case, with a
pin counter mechanism, I believe it would be fine.

Summary of the changes to the attached main patch:

* Added an overview comment at top of smgr.c

* Added the comment Robert suggested to smgropen()

* Replaced the single owner with a pin count and smgrpin() / smgrunpin()
functions. smgrdestroyall() now only destroys unpinned entries

* Removed that kluge from swap_relation_files(). It should not be needed
anymore with the pin counter.

* Changed a few places in bufmgr.c where we called RelationGetSmgr() on
every smgr call to keep the SMgrRelation in a local variable. That was
not safe before, but is now. I don't think we should go on a spree to
change all callers - RelationGetSmgr() is still cheap - but in these few
places it seems worthwhile.

* I kept the separate smgrclose() and smgrrelease() functions. I know I
suggested to just change smgrclose() to do what smgrrelease() did, but
on second thoughts keeping them separate seems nicer. However,
sgmgrclose() just calls smgrrelease() now, so the distinction is just
pro forma. The idea is that if you call smgrclose(), you hint that you
don't need that SMgrRelation pointer anymore, although there might be
other pointers to the same object and they stay valid.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachment Content-Type Size
0001-Remove-some-obsolete-smgrcloseall-calls.patch text/x-patch 2.8 KB
0002-Don-t-try-to-open-visibilitymap-when-analyzing-a-for.patch text/x-patch 1.4 KB
0003-Give-SMgrRelation-pointers-a-well-defined-lifetime.patch text/x-patch 29.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2023-11-29 12:45:21 Re: logical decoding and replication of sequences, take 2
Previous Message Andrew Dunstan 2023-11-29 12:37:53 Re: remaining sql/json patches