From: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | RE: Fix slot synchronization with two_phase decoding enabled |
Date: | 2025-04-03 07:15:39 |
Message-ID: | OS3PR01MB571846125A89761A47389CA594AE2@OS3PR01MB5718.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Apr 3, 2025 at 1:38 PM Masahiko Sawada wrote:
>
> On Wed, Apr 2, 2025 at 7:58 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> >
> > On Thu, Apr 3, 2025 at 7:50 AM Zhijie Hou (Fujitsu)
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > On Thu, Apr 3, 2025 at 3:30 AM Masahiko Sawada wrote:
> > >
> > > >
> > > > On Wed, Apr 2, 2025 at 6:33 AM Zhijie Hou (Fujitsu)
> > > > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > > >
> > > > Thank you for the explanation! I agree that the issue happens in these
> cases.
Just to share the steps to reproduce the issue when creating the subscription
With (create_slot=true, failover = true, two_phase=true, copy_data=false) for reference.
- pub: begin a txn A. This is to stop the slot from building a consistent
snapshot immediately.
- sub: create subscription with (slot_name='sub', create_slot=true, failover =
true, two_phase=true, copy_data=false);
- pub: Attach to the walsender that will create the slot and add a checkpoint
in SnapBuildFindSnapshot() -> (builder->state = SNAPBUILD_FULL_SNAPSHOT;).
For now, the state should be BUILDING_SNAPSHOT.
- pub: begin another txn B and commit txn A. The snapshot should reach
FULL_SNAPSHOT now.
- pub: prepared a txn C and then commit txn B.
- pub: release the checkpoint in SnapBuildFindSnapshot(), then it should reach
the SNAPBUILD_CONSISTENT. Now we have a prepared txn whose prepare_lsn is
less than the two_phase_at.
- stop the primary and promote the standby.
- sub: alter the subscription to use the new primary as the publisher.
- commit the prepared transaction on new primary, the following error will be
reported on subscriber:
LOG: logical replication apply worker for subscription "sub" has started
ERROR: prepared transaction with identifier "pg_gid_16398_764" does not exist.
> > > >
> > > > As another idea, I wonder if we could somehow defer to make the
> > > > synced slot as 'sync-ready' until we can ensure that the slot
> > > > doesn't have any transactions that are prepared before the point
> > > > of enabling two_phase. For example, when the slotsync worker
> > > > fetches the remote slot, it remembers the confirmed_flush_lsn (say
> > > > LSN-1) if the local slot's two_phase becomes true or the local
> > > > slot is newly created with enabling two_phase, and then it makes
> > > > the slot 'sync-ready' once it confirmed that the slot's restart_lsn passed
> LSN-1. Does it work?
> > >
> > > Thanks for the idea!
> > >
> > > We considered a similar approach in [1] to confirm there is no
> > > prepared transactions before two_phase_at, but the issue is that
> > > when the two_phase flag is switched from 'false' to 'true' (as in
> > > the case with (copy_data=true, failover=true, two_phase=true)). In
> > > this case, the slot may have already been marked as sync-ready
> > > before the two_phase flag is enabled, as slotsync is unaware of potential
> future changes to the two_phase flag.
> >
> > This can happen because when copy_data is true, tablesync can take a
> > long time to complete the sync and in the meantime, slot without a
> > two_phase flag would have been synced to standby. Such a slot would be
> > marked as sync-ready even if we follow the calculation proposed by
> > Sawada-san. Note that we enable two_phase once all the tables are in
> > ready state (See run_apply_worker() and comments atop worker.c
> > (TWO_PHASE TRANSACTIONS)).
>
> Right. It doesn't make sense to make the slot not-sync-ready and then back to
> sync-ready.
>
> While I agree with the approach for HEAD and it seems difficult to find a
> solution, I'm concerned that disallowing to use both failover and two_phase in
> a minor release would affect users much. Users who are already using that
> combination might end up needing to re-think their system architecture. So I'm
> trying to narrow down use cases where we're going to prohibit or to find
> workarounds.
>
> If we agree with the fix for HEAD, we can push the fix for HEAD first, which
> would be better to be done sooner as it needs to bump the catversion. We can
> discuss the ideas and workarounds for v17 later.
Agreed. I will think more on it. One workaround could be skipping the prepared
transaction when such an issue arises, followed by manually replicating the
skipped changes.
Best Regards,
Hou zj
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Gustafsson | 2025-04-03 07:20:34 | Re: Extensible user mapping handler for FDW |
Previous Message | Jakub Wartak | 2025-04-03 07:01:43 | Re: Draft for basic NUMA observability |