From: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | RE: Fix slot synchronization with two_phase decoding enabled |
Date: | 2025-04-02 13:32:50 |
Message-ID: | OS0PR01MB57160DC1A4BB222DAD9DDEC294AF2@OS0PR01MB5716.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Apr 2, 2025 at 3:45 PM Masahiko Sawada wrote:
Hi,
>
> On Mon, Mar 31, 2025 at 4:3 AM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Thu, Mar 27, 2025 at 2:29 PM Amit Kapila wrote:
> >
> > >
> > > On Tue, Mar 25, 2025 at 12:1 PM Amit Kapila
> > > <amit(dot)kapila16(at)gmail(dot)com>
> > > wrote:
> > > >
> > > > On Tue, Mar 25, 2025 at 11:05 AM Zhijie Hou (Fujitsu)
> > > > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > When testing the slot synchronization with logical replication
> > > > > slots that enabled two_phase decoding, I found that transactions
> > > > > prepared before
> > > two-phase
> > > > > decoding is enabled may fail to replicate to the subscriber
> > > > > after being committed on a promoted standby following a failover.
> > > > >
> > > > > To reproduce this issue, please follow these steps (also
> > > > > detailed in the attached TAP test, v1-0001):
> > > > >
> > > > > 1. sub: create a subscription with (two_phase = false) 2.
> > > > > primary (pub): prepare a txn A.
> > > > > 3. sub: alter subscription set (two_phase = true) and wait for
> > > > > the logical
> > > slot to
> > > > > be synced to standby.
> > > > > 4. primary (pub): stop primary, promote the standby and let the
> > > > > subscriber
> > > use
> > > > > the promoted standby as publisher.
> > > > > 5. promoted standby (pub): COMMIT PREPARED A; 6. sub: the apply
> > > > > worker will report the following ERROR because it didn't
> > > > > receive the PREPARE.
> > > > > ERROR: prepared transaction with identifier "pg_gid_16387_752"
> > > does not exist
> > > > >
> > > > > I think the root cause of this issue is that the two_phase_at
> > > > > field of the slot, which indicates the LSN from which two-phase
> > > > > decoding is enabled
> > > (used to
> > > > > prevent duplicate data transmission for prepared transactions),
> > > > > is not synchronized to the standby server.
> > > > >
> > > > > In step 3, transaction A is not immediately replicated because
> > > > > it occurred before enabling two-phase decoding. Thus, the
> > > > > prepared transaction
> > > should only
> > > > > be replicated after decoding the final COMMIT PREPARED, as
> > > > > referenced
> > > in
> > > > > ReorderBufferFinishPrepared(). However, due to the invalid
> > > > > two_phase_at
> > > on the
> > > > > standby, the prepared transaction fails to send at that time.
> > > > >
> > > > > This problem arises after the support for altering the two-phase
> > > > > option (1462aad).
> > > > >
> > >
> > > I suspect that this can happen in PG17 as well, but I need to think
> > > more about it to make a reproducible test case.
> >
> > After further analysis, I was able to reproduce the same issue [1] in
> > PG 17.
> >
> > [1]
> > - pub: created a slot 'sub' with two_phase=false, then prepared a
> > transaction
> > - pub: after some activity, advanced the confirmed_flush_lsn of 'sub', so it is
> > greater than prepared txn lsn.
> > - sub: create subscription with (slot_name='sub', create_slot=false,
> > failover = true, two_phase=true, copy_data=false); two_phase_at will
> > be set to the same as confirmed_flush_lsn which is greater than the
> prepared transaction.
> > - stop the primary and promote the standby.
> > - commit the prepared transaction on standby, the following error will be
> > reported on subscriber:
>
> It seems to require elaborate steps to reproduce this issue in v17. I wonder if we
> could somehow narrow down the cases that we want to prohibit. The patch for
> v17 disallows CREATE SUBSCRIPTION to enable both two_phase and failover,
> but I guess that it's still safe if it also creates the replication slot (e.g.,
> create_slot is true). If my understanding is right, we can allow users to specify
> both fields if CRETE SUBSCRIPTION creates the slot, and we don't need to
> disallow that in ReplicationSlotCreate().
Thanks for reviewing the steps.
The current reproducer aims for simplicity; however, I think it's possible to reproduce
the issue even with create_slot = true, although it requires the help of a
debugger and additional steps. But as long as there are transactions prepared
before the two_phase_at position, and they are skipped due to checks in
ReorderBufferFinishPrepared() (refer to comments[1] for why we skip sending
prepared transaction), the issue can be reproduced.
For instance, when creating a subscription with (copy_data=true, failover=true,
two_phase=true), the slot's two_phase setting starts as false and shifts to
true after table sync (refer to [2] for related comments). During this period,
if a user prepares a transaction where the prepare LSN is less than the
two_phase_at, the same problem could happen.
Similarly, when setting up a subscription with (copy_data=false, failover=true,
two_phase=true), although two_phase is initially set to true and we wait for
running transactions to finish when building consistent snapshot, a race
condition may still exist: If the snapshot state reaches FULL_SNAPSHOT, it
won't check running transactions further (see SnapBuildFindSnapshot() for
specifics), if a user prepares a transaction at this point, it's possible for
the prepare LSN to be less than the LSN marking the snapshot's consistent
state, causing the same issue.
[1]
/*
* It is possible that this transaction is not decoded at prepare time
* either because by that time we didn't have a consistent snapshot, or
* two_phase was not enabled, or it was decoded earlier but we have
* restarted. We only need to send the prepare if it was not decoded
* earlier. We don't need to decode the xact for aborts if it is not done
* already.
*/
[2]
/*
* Even if two_phase is set, don't create the slot with
* two-phase enabled. Will enable it once all the tables are
* synced and ready. This avoids race-conditions like prepared
* transactions being skipped due to changes not being applied
* due to checks in should_apply_changes_for_rel() when
* tablesync for the corresponding tables are in progress. See
* comments atop worker.c.
*
* Note that if tables were specified but copy_data is false
* then it is safe to enable two_phase up-front because those
* tables are already initially in READY state. When the
* subscription has no tables, we leave the twophase state as
* PENDING, to allow ALTER SUBSCRIPTION ... REFRESH
* PUBLICATION to work.
*/
Best Regards,
Hou zj
From | Date | Subject | |
---|---|---|---|
Next Message | George MacKerron | 2025-04-02 13:39:06 | Re: Making sslrootcert=system work on Windows psql |
Previous Message | Richard Guo | 2025-04-02 13:26:44 | Some problems regarding the self-join elimination code |