RE: Fix slot synchronization with two_phase decoding enabled

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Fix slot synchronization with two_phase decoding enabled
Date: 2025-04-03 07:15:39
Message-ID: OS3PR01MB571846125A89761A47389CA594AE2@OS3PR01MB5718.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 3, 2025 at 1:38 PM Masahiko Sawada wrote:

>
> On Wed, Apr 2, 2025 at 7:58 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> >
> > On Thu, Apr 3, 2025 at 7:50 AM Zhijie Hou (Fujitsu)
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > On Thu, Apr 3, 2025 at 3:30 AM Masahiko Sawada wrote:
> > >
> > > >
> > > > On Wed, Apr 2, 2025 at 6:33 AM Zhijie Hou (Fujitsu)
> > > > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > > >
> > > > Thank you for the explanation! I agree that the issue happens in these
> cases.

Just to share the steps to reproduce the issue when creating the subscription
With (create_slot=true, failover = true, two_phase=true, copy_data=false) for reference.

- pub: begin a txn A. This is to stop the slot from building a consistent
snapshot immediately.
- sub: create subscription with (slot_name='sub', create_slot=true, failover =
true, two_phase=true, copy_data=false);
- pub: Attach to the walsender that will create the slot and add a checkpoint
in SnapBuildFindSnapshot() -> (builder->state = SNAPBUILD_FULL_SNAPSHOT;).
For now, the state should be BUILDING_SNAPSHOT.
- pub: begin another txn B and commit txn A. The snapshot should reach
FULL_SNAPSHOT now.
- pub: prepared a txn C and then commit txn B.
- pub: release the checkpoint in SnapBuildFindSnapshot(), then it should reach
the SNAPBUILD_CONSISTENT. Now we have a prepared txn whose prepare_lsn is
less than the two_phase_at.
- stop the primary and promote the standby.
- sub: alter the subscription to use the new primary as the publisher.
- commit the prepared transaction on new primary, the following error will be
reported on subscriber:

LOG: logical replication apply worker for subscription "sub" has started
ERROR: prepared transaction with identifier "pg_gid_16398_764" does not exist.

> > > >
> > > > As another idea, I wonder if we could somehow defer to make the
> > > > synced slot as 'sync-ready' until we can ensure that the slot
> > > > doesn't have any transactions that are prepared before the point
> > > > of enabling two_phase. For example, when the slotsync worker
> > > > fetches the remote slot, it remembers the confirmed_flush_lsn (say
> > > > LSN-1) if the local slot's two_phase becomes true or the local
> > > > slot is newly created with enabling two_phase, and then it makes
> > > > the slot 'sync-ready' once it confirmed that the slot's restart_lsn passed
> LSN-1. Does it work?
> > >
> > > Thanks for the idea!
> > >
> > > We considered a similar approach in [1] to confirm there is no
> > > prepared transactions before two_phase_at, but the issue is that
> > > when the two_phase flag is switched from 'false' to 'true' (as in
> > > the case with (copy_data=true, failover=true, two_phase=true)). In
> > > this case, the slot may have already been marked as sync-ready
> > > before the two_phase flag is enabled, as slotsync is unaware of potential
> future changes to the two_phase flag.
> >
> > This can happen because when copy_data is true, tablesync can take a
> > long time to complete the sync and in the meantime, slot without a
> > two_phase flag would have been synced to standby. Such a slot would be
> > marked as sync-ready even if we follow the calculation proposed by
> > Sawada-san. Note that we enable two_phase once all the tables are in
> > ready state (See run_apply_worker() and comments atop worker.c
> > (TWO_PHASE TRANSACTIONS)).
>
> Right. It doesn't make sense to make the slot not-sync-ready and then back to
> sync-ready.
>
> While I agree with the approach for HEAD and it seems difficult to find a
> solution, I'm concerned that disallowing to use both failover and two_phase in
> a minor release would affect users much. Users who are already using that
> combination might end up needing to re-think their system architecture. So I'm
> trying to narrow down use cases where we're going to prohibit or to find
> workarounds.
>
> If we agree with the fix for HEAD, we can push the fix for HEAD first, which
> would be better to be done sooner as it needs to bump the catversion. We can
> discuss the ideas and workarounds for v17 later.

Agreed. I will think more on it. One workaround could be skipping the prepared
transaction when such an issue arises, followed by manually replicating the
skipped changes.

Best Regards,
Hou zj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2025-04-03 07:20:34 Re: Extensible user mapping handler for FDW
Previous Message Jakub Wartak 2025-04-03 07:01:43 Re: Draft for basic NUMA observability