Re: Fix slot synchronization with two_phase decoding enabled

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fix slot synchronization with two_phase decoding enabled
Date: 2025-04-26 12:07:38
Message-ID: CAA4eK1LhrjqfFFQY4zc=UVd3Hazv4OMPg9F10NaWpko5hFanPw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 25, 2025 at 9:57 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Fri, Apr 25, 2025 at 3:43 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Fri, Apr 25, 2025 at 6:02 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > I realized that users who create a logical slot using
> > > pg_create_logical_replication_slot() would not be able to enable both
> > > options at slot creation, and there is no easy way to enable the
> > > failover after two_phase-enabled-slot creation. Users would need to
> > > use ALTER_REPLICATION_SLOT replication command, which seems
> > > unrealistics for users to use. On the other hand, if we allow creating
> > > a logical slot with enabling failover and two_phase using SQL API,
> > > there is still a chance for this bug to occur. Would it be worth
> > > considering that if a logical slot is created with enabling failover
> > > and two_phase using SQL API, we create the slot with only
> > > two_phase=true, then advance the slot until the slot satisfies
> > > restart_lsn >= two_phase_at, and then enable the failover?
> > >
> >
> > This means we either need to maintain somewhere that user has provided
> > failover flag till restart_lsn >= two_phase_at or and then set
> > failover flag in the slot
>
> I was thinking of this idea.
>
> > or initially mark it but enable the
> > functionality of failover when we reach the condition restart_lsn >=
> > two_phase_at.
>
> IIUC the slot could be synchronized to the standby as soon as we
> complete DecodingContextFindStartpoint() for a failover-enabled slot.
> So we would need some mechanisms to make sure that the slot is not
> synchronized while we're waiting to reach the condition restart_lsn >=
> two_phase_at even if the failover is enabled.
>

So, then we need any state or persistent flag for this.

> > Both seem to have different kinds of problems. The first
> > idea seems to have an issue with persistence, which means we can lose
> > track of the flag after the restart.
>
> I think we can do this series of operations while the slot is not
> persistent, that is the slot is still RS_EPHEMERAL.
>

But we still need a persistent flag to indicate such slots shouldn't
be synced to standby till we reach the condition restart_lsn >=
two_phase_at.

> > The second can mislead the user
> > for a long period in cases where prepare and commit have a large time
> > gap. I feel this will introduce complexity either in the form of code
> > or in giving the information to the user.
>
> Agreed. Both ways introduce complexity so we need to consider the
> user-unfriendliness (by not having a proper way to enable failover for
> the two_phase-enabled-slot using SQL API) vs. risk (of introducing
> complexity).
>

Right, to me it sounds risky to provide such functionality for SQL API
in the back branch.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-04-26 18:55:51 Re: Avoid circular header file dependency
Previous Message Amit Kapila 2025-04-26 12:01:29 Re: Fix premature xmin advancement during fast forward decoding