Re: Fix slot synchronization with two_phase decoding enabled

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fix slot synchronization with two_phase decoding enabled
Date: 2025-03-25 06:44:53
Message-ID: CAA4eK1+Row5XWDbOCTgd4_s=eaqXAL7iXDFQkAinuJFqOTt46A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 25, 2025 at 11:05 AM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> Hi,
>
> When testing the slot synchronization with logical replication slots that
> enabled two_phase decoding, I found that transactions prepared before two-phase
> decoding is enabled may fail to replicate to the subscriber after being
> committed on a promoted standby following a failover.
>
> To reproduce this issue, please follow these steps (also detailed in the
> attached TAP test, v1-0001):
>
> 1. sub: create a subscription with (two_phase = false)
> 2. primary (pub): prepare a txn A.
> 3. sub: alter subscription set (two_phase = true) and wait for the logical slot to
> be synced to standby.
> 4. primary (pub): stop primary, promote the standby and let the subscriber use
> the promoted standby as publisher.
> 5. promoted standby (pub): COMMIT PREPARED A;
> 6. sub: the apply worker will report the following ERROR because it didn't
> receive the PREPARE.
> ERROR: prepared transaction with identifier "pg_gid_16387_752" does not exist
>
> I think the root cause of this issue is that the two_phase_at field of the
> slot, which indicates the LSN from which two-phase decoding is enabled (used to
> prevent duplicate data transmission for prepared transactions), is not
> synchronized to the standby server.
>
> In step 3, transaction A is not immediately replicated because it occurred
> before enabling two-phase decoding. Thus, the prepared transaction should only
> be replicated after decoding the final COMMIT PREPARED, as referenced in
> ReorderBufferFinishPrepared(). However, due to the invalid two_phase_at on the
> standby, the prepared transaction fails to send at that time.
>
> This problem arises after the support for altering the two-phase option
> (1462aad).
>

Thanks for the report and patch. I'll look into it.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2025-03-25 06:47:10 Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET
Previous Message Andrei Lepikhov 2025-03-25 06:40:47 Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment