Found issues related with logical replication and 2PC

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: "'pgsql-hackers(at)lists(dot)postgresql(dot)org'" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Found issues related with logical replication and 2PC
Date: 2024-07-24 06:55:24
Message-ID: TYAPR01MB5692FA4926754B91E9D7B5F0F5AA2@TYAPR01MB5692.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

While creating a patch which allows ALTER SUBSCRIPTION SET (two_phase) [1],
we found some issues related with logical replication and two_phase. I think this
can happen not only HEAD but PG14+, but for now I shared patches for HEAD.

Issue #1

When handling a PREPARE message, the subscriber mistook the wrong lsn position
(the end position of the last commit) as the end position of the current prepare.
This can be fixed by adding a new global variable to record the end position of
the last prepare. 0001 patch fixes the issue.

Issue #2

When the subscriber enables two-phase commit but doesn't set max_prepared_transaction >0
and a transaction is prepared on the publisher, the apply worker reports an ERROR
on the subscriber. After that, the prepared transaction is not replayed, which
means it's lost forever. Attached script can emulate the situation.

--
ERROR: prepared transactions are disabled
HINT: Set "max_prepared_transactions" to a nonzero value.
--

The reason is that we advanced the origin progress when aborting the
transaction as well (RecordTransactionAbort->replorigin_session_advance). So,
after setting replorigin_session_origin_lsn, if any ERROR happens when preparing
the transaction, the transaction aborts which incorrectly advances the origin lsn.

An easiest fix is to reset session replication origin before calling the
RecordTransactionAbort(). I think this can happen when 1) LogicalRepApplyLoop()
raises an ERROR or 2) apply worker exits. 0002 patch fixes the issue.

How do you think?

[1]: https://www.postgresql.org/message-id/flat/8fab8-65d74c80-1-2f28e880(at)39088166

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
test_2pc.sh application/octet-stream 1.6 KB
0001-Add-XactLastPrepareEnd-to-indicate-the-last-PREPARE-.patch application/octet-stream 4.8 KB
0002-Prevent-origin-progress-advancement-if-failed-to-app.patch application/octet-stream 4.7 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2024-07-24 06:55:25 warning: dereferencing type-punned pointer
Previous Message vignesh C 2024-07-24 06:40:01 Re: Logical Replication of sequences