RE: Slow catchup of 2PC (twophase) transactions on replica in LR

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, 'Peter Smith' <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Date: 2024-07-18 03:22:18
Message-ID: OS0PR01MB5716C21933462A27E42F714A94AC2@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday, July 18, 2024 10:11 AM Kuroda, Hayato/黒田 隼人 <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Peter,
>
> Thanks for giving comments! PSA new version.

I did a few more tests and analysis and didn't find issues. Just share the
cases I tested:

1. After manually rolling back xacts for two_pc and switch two_pc option from
true to false, does the prepared transaction again get replicated again when
COMMIT PREPARED happens.

It work as expected in this case. E.g. the transaction will be sent to
subscriber after disabling two_pc.

And I think there wouldn't be race conditions between the ALTER command
and apply worker because user needs to disable the subscription(the apply
worker will stop) before altering the two_phase the option.

And the WALs for the prepared transaction is retained until the COMMIT
PREPARED, because we don't advance the slot's restart_lsn over the ongoing
transactions(e.g. the prepared transaction in this case):

SnapBuildProcessRunningXacts
...
txn = ReorderBufferGetOldestTXN(builder->reorder);
...
/*
* oldest ongoing txn might have started when we didn't yet serialize
* anything because we hadn't reached a consistent state yet.
*/
if (txn != NULL && txn->restart_decoding_lsn != InvalidXLogRecPtr)
LogicalIncreaseRestartDecodingForSlot(lsn, txn->restart_decoding_lsn);

So, the data of the prepared transaction is safe.

2. Test when prepare is already processed but we alter the option false to
true.

This case works as expected as well e.g. the whole transaction will be sent to the
subscriber on COMMIT PREPARE using two_pc flow:

"begin prepare" -> "txn data" -> "prepare" -> "commit prepare"

Due to the same reason in case 1, there is no concurrency issue and the
data of the transaction will be retained until COMMIT PREPARED.

Best Regards,
Hou zj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Paul Jungwirth 2024-07-18 03:34:42 Re: SQL:2011 application time
Previous Message Richard Guo 2024-07-18 03:08:48 Re: Redundant code in create_gather_merge_path