Re: Fix slot synchronization with two_phase decoding enabled

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fix slot synchronization with two_phase decoding enabled
Date: 2025-04-03 02:57:56
Message-ID: CAA4eK1LbXoC0SwwBhhg-OAF3uW2bOyYooVXGW9kxBARnfBRzUA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 3, 2025 at 7:50 AM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Thu, Apr 3, 2025 at 3:30 AM Masahiko Sawada wrote:
>
> >
> > On Wed, Apr 2, 2025 at 6:33 AM Zhijie Hou (Fujitsu)
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > Thank you for the explanation! I agree that the issue happens in these cases.
> >
> > As another idea, I wonder if we could somehow defer to make the synced
> > slot as 'sync-ready' until we can ensure that the slot doesn't have
> > any transactions that are prepared before the point of enabling
> > two_phase. For example, when the slotsync worker fetches the remote
> > slot, it remembers the confirmed_flush_lsn (say LSN-1) if the local
> > slot's two_phase becomes true or the local slot is newly created with
> > enabling two_phase, and then it makes the slot 'sync-ready' once it
> > confirmed that the slot's restart_lsn passed LSN-1. Does it work?
>
> Thanks for the idea!
>
> We considered a similar approach in [1] to confirm there is no prepared
> transactions before two_phase_at, but the issue is that when the two_phase flag
> is switched from 'false' to 'true' (as in the case with (copy_data=true,
> failover=true, two_phase=true)). In this case, the slot may have already been
> marked as sync-ready before the two_phase flag is enabled, as slotsync is
> unaware of potential future changes to the two_phase flag.
>

This can happen because when copy_data is true, tablesync can take a
long time to complete the sync and in the meantime, slot without a
two_phase flag would have been synced to standby. Such a slot would be
marked as sync-ready even if we follow the calculation proposed by
Sawada-san. Note that we enable two_phase once all the tables are in
ready state (See run_apply_worker() and comments atop worker.c
(TWO_PHASE TRANSACTIONS)).

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2025-04-03 03:15:06 Some codes refer slot()->{'slot_name'} but it is not defined
Previous Message Sami Imseih 2025-04-03 02:53:17 Re: [PATCH] Re: Proposal to Enable/Disable Index using ALTER INDEX