[bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: "'pgsql-hackers(at)lists(dot)postgresql(dot)org'" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "shveta(dot)malik(at)gmail(dot)com" <shveta(dot)malik(at)gmail(dot)com>
Subject: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber
Date: 2024-08-08 05:07:18
Message-ID: TYAPR01MB5692FAC23BE40C69DA8ED4AFF5B92@TYAPR01MB5692.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear hackers,

This thread forks from [1]. Here can be used to discuss second item.
Below part contains the same statements written in [1], but I did copy-and-paste
just in case. Attached patch is almost the same but bit modified based on the comment
from Amit [2] - an unrelated change is removed.

Found issue
=====
When the subscriber enables two-phase commit but doesn't set max_prepared_transaction >0
and a transaction is prepared on the publisher, the apply worker reports an ERROR
on the subscriber. After that, the prepared transaction is not replayed, which
means it's lost forever. Attached script can emulate the situation.

--
ERROR: prepared transactions are disabled
HINT: Set "max_prepared_transactions" to a nonzero value.
--

The reason is that we advanced the origin progress when aborting the
transaction as well (RecordTransactionAbort->replorigin_session_advance). So,
after setting replorigin_session_origin_lsn, if any ERROR happens when preparing
the transaction, the transaction aborts which incorrectly advances the origin lsn.

An easiest fix is to reset session replication origin before calling the
RecordTransactionAbort(). I think this can happen when 1) LogicalRepApplyLoop()
raises an ERROR or 2) apply worker exits. Attached patch can fix the issue on HEAD.

[1]: https://www.postgresql.org/message-id/TYAPR01MB5692FA4926754B91E9D7B5F0F5AA2%40TYAPR01MB5692.jpnprd01.prod.outlook.com
[2]: https://www.postgresql.org/message-id/CAA4eK1L-r8OKGdBwC6AeXSibrjr9xKsg8LjGpX_PDR5Go-A9TA%40mail.gmail.com

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
v2-0001-Prevent-origin-progress-advancement-if-failed-to-.patch application/octet-stream 4.1 KB
test_2pc.sh application/octet-stream 1.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2024-08-08 05:09:25 RE: Found issues related with logical replication and 2PC
Previous Message Amit Kapila 2024-08-08 04:52:56 Re: Found issues related with logical replication and 2PC