Re: repeated decoding of prepared transactions

From: Markus Wanner <markus(dot)wanner(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Ajin Cherian <itsajin(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, simon(dot)riggs(at)enterprisedb(dot)com, Andres Freund <andres(at)anarazel(dot)de>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>
Subject: Re: repeated decoding of prepared transactions
Date: 2021-02-22 08:25:17
Message-ID: 14b5f81f-43a8-0926-a346-3533509ec9fe@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 20.02.21 13:15, Amit Kapila wrote:
> I think after the patch Ajin proposed decoders won't need any special
> checks after receiving the prepared xacts. What additional simplicity
> this approach will bring?

The API becomes clearer in that all PREPAREs are always decoded in WAL
stream order and are not ever deferred (possibly until after the commits
of many other transactions). No output plugin will need to check
against this peculiarity, but can rely on WAL ordering of events.

(And if an output plugin does not want prepares to be individual events,
it should simply not enable two-phase support. That seems like
something the output plugin could even do on a per-transaction basis.)

> Do you mean to say that after creating the slot we take an additional
> pass over WAL (till the LSN where we found a consistent snapshot) to
> collect all prepared transactions and wait for them to get
> committed/rollbacked?

No. A single pass is enough, the decoder won't need any further change
beyond the code removal in my patch.

I'm proposing for the synchronization logic (in e.g. pgoutput) to defer
the snapshot taking. So that there's some time in between creating the
logical slot (at step 1.) and taking a snapshot (at step 4.). Another
CATCHUP phase, if you want.

So that all two-phase commit transactions are delivered via either:

* the transferred snapshot (because their COMMIT PREPARED took place
before the snapshot was taken in (4)), or

* the decoder stream (because their PREPARE took place after the slot
was fully created and snapbuilder reached a consistent snapshot)

No transaction can have PREPAREd before (1) but not committed until
after (4), because we waited for all prepared transactions to commit in
step (3).

> I think the scheme proposed by you is still not fully clear to me but
> can you please explain how in the existing proposed patch there is a
> danger of showing transactions as committed without the effects of the
> PREPAREs being "visible"?

Please see the `twophase_snapshot` isolation test. The expected output
there shows the insert from s1 being committed prior to the prepare of
the transaction in s2.

On a replica applying the stream in that order, a transaction in between
these two events would see the results from s1 while still being allowed
to lock the row that s2 is about to update. Something I'd expect the
PREPARE to prevent.

That is (IMO) wrong in `master` and Ajin's patch doesn't correct it.
(While my patch does, so don't look at my patch for this example.)

>> * Second, it becomes possible to avoid inconsistencies during the
>> reconciliation window in between steps 5 and 6 by disallowing
>> concurrent (user) transactions to run until after completion of
>> step 6.
>
> This second point sounds like a restriction that users might not like.

"It becomes possible" cannot be a restriction. If a user (or
replication solution) wants to allow for these inconsistencies, it still
can. I want to make sure that solutions which *want* to prevent
inconsistencies can be implemented.

Your concern applies to step (3), though. The current approach is
clearly quicker to restore the backup and start to apply transactions.
Until you start to think about reordering the "early" commits until
after the deferred PREPAREs in the output plugin or on the replica side,
so as to lock rows by prepared transactions before making other commits
visible so as to prevent inconsistencies...

> But we need something in existing logic in WALSender or somewhere to
> allow supporting 2PC for subscriptions and from your above
> description, it is not clear to me how we can achieve that?

I agree that some more code is required somewhere, outside of the walsender.

Regards

Markus

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Markus Wanner 2021-02-22 08:25:52 Re: repeated decoding of prepared transactions
Previous Message iwata.aya@fujitsu.com 2021-02-22 08:22:02 RE: libpq debug log