Re: ResourceOwner refactoring

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc: Aleksander Alekseev <aleksander(at)timescale(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: ResourceOwner refactoring
Date: 2024-06-03 22:49:52
Message-ID: 7d9fa5cf-07e3-4691-b40a-802048a0b998@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the testing!

On 01/06/2024 11:00, Alexander Lakhin wrote:
> Hello Heikki,
>
> I've stumbled upon yet another anomaly introduced with b8bff07da.
>
> Please try the following script:
> SELECT 'init' FROM pg_create_logical_replication_slot('slot',
>   'test_decoding');
> CREATE TABLE tbl(t text);
> BEGIN;
> TRUNCATE table tbl;
>
> SELECT data FROM pg_logical_slot_get_changes('slot', NULL, NULL,
>  'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
>
> On b8bff07da (and the current HEAD), it ends up with:
> ERROR:  ResourceOwnerEnlarge called after release started
>
> But on the previous commit, I get no error:
>  data
> ------
> (0 rows)

This is another case of the issue I tried to fix in commit b8bff07da
with this:

> --- b/src/backend/access/transam/xact.c
> +++ a/src/backend/access/transam/xact.c
> @@ -5279,7 +5279,20 @@
>
> AtEOSubXact_RelationCshould be ache(false, s->subTransactionId,
> s->parent->subTransactionId);
> +
> +
> + /*
> + * AtEOSubXact_Inval sometimes needs to temporarily bump the refcount
> + * on the relcache entries that it processes. We cannot use the
> + * subtransaction's resource owner anymore, because we've already
> + * started releasing it. But we can use the parent resource owner.
> + */
> + CurrentResourceOwner = s->parent->curTransactionOwner;
> +
> AtEOSubXact_Inval(false);
> +
> + CurrentResourceOwner = s->curTransactionOwner;
> +
> ResourceOwnerRelease(s->curTransactionOwner,
> RESOURCE_RELEASE_LOCKS,
> false, false);

AtEOSubXact_Inval calls LocalExecuteInvalidationMessage(), which
ultimately calls RelationFlushRelation(). Your test case reaches
ReorderBufferExecuteInvalidations(), which also calls
LocalExecuteInvalidationMessage(), in an already aborted subtransaction.

Looking closer at relcache.c, I think we need to make
RelationFlushRelation() safe to call without a valid resource owner.
There are already IsTransactionState() checks in
RelationClearRelation(), and a comment noting that it does not touch
CurrentResourceOwner. So we should make sure RelationFlushRelation()
doesn't touch it either, and revert the above change to
AbortTransaction(). I didn't feel very good about it in the first place,
and now I see what the proper fix is.

A straightforward fix is to modify RelationFlushRelation() so that if
!IsTransactionState(), it just marks the entry as invalid instead of
calling RelationClearRelation(). That's what RelationClearRelation()
would do anyway, if it didn't hit the assertion first.

RelationClearRelation() is complicated. Depending on the circumstances,
it removes the relcache entry, rebuilds it, marks it as invalid, or
performs a partial reload of a nailed relation. So before going for the
straightforward fix, I'm going to see if I can refactor
RelationClearRelation() to be less complicated.

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-06-03 22:58:27 Re: Test to dump and restore objects left behind by regression
Previous Message Alexander Korotkov 2024-06-03 22:47:43 Re: POC: GROUP BY optimization