Re: Fix slot synchronization with two_phase decoding enabled

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fix slot synchronization with two_phase decoding enabled
Date: 2025-04-25 10:43:12
Message-ID: CAA4eK1Kfo9YuEZfM5pM=ANbW1wfTfiU4a-Fuf15zV27pCWhyHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 25, 2025 at 6:02 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> I realized that users who create a logical slot using
> pg_create_logical_replication_slot() would not be able to enable both
> options at slot creation, and there is no easy way to enable the
> failover after two_phase-enabled-slot creation. Users would need to
> use ALTER_REPLICATION_SLOT replication command, which seems
> unrealistics for users to use. On the other hand, if we allow creating
> a logical slot with enabling failover and two_phase using SQL API,
> there is still a chance for this bug to occur. Would it be worth
> considering that if a logical slot is created with enabling failover
> and two_phase using SQL API, we create the slot with only
> two_phase=true, then advance the slot until the slot satisfies
> restart_lsn >= two_phase_at, and then enable the failover?
>

This means we either need to maintain somewhere that user has provided
failover flag till restart_lsn >= two_phase_at or and then set
failover flag in the slot or initially mark it but enable the
functionality of failover when we reach the condition restart_lsn >=
two_phase_at. Both seem to have different kinds of problems. The first
idea seems to have an issue with persistence, which means we can lose
track of the flag after the restart. The second can mislead the user
for a long period in cases where prepare and commit have a large time
gap. I feel this will introduce complexity either in the form of code
or in giving the information to the user.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christoph Berg 2025-04-25 10:52:30 doc: Some copy-editing around prefix operators
Previous Message shveta malik 2025-04-25 10:35:51 Re: Conflict detection for update_deleted in logical replication