Re: Bug in wait time when waiting on nested subtransaction

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bug in wait time when waiting on nested subtransaction
Date: 2022-11-28 21:10:01
Message-ID: CA+TgmobgqhCZNRdTk1jDHDvdaVBqOY4Tkp3LqYnrNXxfnDiZOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 28, 2022 at 3:01 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> One thing we need to be pretty careful of here is to not break the
> promise of atomic commit. At topmost commit, all subxacts must
> appear committed simultaneously. It's not quite clear to me whether
> we need a similar guarantee in the rollback case. It seems like
> we shouldn't, but maybe I'm missing something, in which case maybe
> the current behavior is correct?

AFAICS, the behavior that Simon is complaining about here is an
exception to the way things work in general, and therefore his
complaint is well-founded. AbortSubTransaction() arranges to release
resources, whereas CommitSubTransaction() arranges to have them
transferred to the parent transaction. This isn't necessarily obvious
from looking at those functions, but if you drill down into the
AtEO(Sub)Xact functions that they call, that's what happens in nearly
all cases. Given that subtransaction abort releases resources
immediately, it seems pretty fair to wonder what the value is in
waiting for its parent or the topmost transaction. I don't see how
that can be necessary for correctness.

The commit message to which Simon (3c27944fb2141) points seems to have
inadvertently changed the behavior while trying to fix a bug and
improve performance. I remember being a bit skeptical about that fix
at the time. Back in the day, you couldn't XactLockTableWait() unless
you knew that the transaction had already started. That commit tried
to make it so that you could XactLockTableWait() earlier, because
that's apparently something that logical replication needs to do. But
that is a major redefinition of the charter of that function, and I am
wondering whether it was a mistake to fold together the thing that we
need in normal cases (which is to wait for a transaction we know has
started and may not have finished) from the thing we need in the
logical decoding case (which apparently has different requirements).
Maybe we should have solved that problem by finding a way to wait for
the transaction to start, and then afterwards wait for it to end. Or
maybe we should have altogether different entrypoints for the two
requirements. Or maybe using one function is fine but we just need it
to be more correct. I'm not really sure.

In short, I think Simon is right that there's a problem and right
about which commit caused it, but I'm not sure what I think we ought
to do about it.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2022-11-28 21:19:35 Re: fixing CREATEROLE
Previous Message Peter Geoghegan 2022-11-28 21:09:28 Re: Add 64-bit XIDs into PostgreSQL 15