Quick Links

Re: Conflict detection for update_deleted in logical replication

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	"Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Conflict detection for update_deleted in logical replication
Date:	2025-01-02 10:34:25
Message-ID:	CALDaNm3whh00AN58Azsps6+NLsYgqL6w2hz6wmcqSw5uiYqAbA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, 25 Dec 2024 at 08:13, Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Monday, December 23, 2024 2:15 PM Kuroda, Hayato/黒田隼人 <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> >
> > Dear Hou,
> >
> > Thanks for updating the patch. Few comments:
>
> Thanks for the comments!
>
> > 02. ErrorOnReservedSlotName()
> >
> > Currently the function is callsed from three points -
> > create_physical_replication_slot(),
> > create_logical_replication_slot() and CreateReplicationSlot().
> > Can we move them to the ReplicationSlotCreate(), or combine into
> > ReplicationSlotValidateName()?
>
> I am not sure because moving the check into these functions because that would
> prevent the launcher from creating the slot as well unless we add a new
> parameter for these functions, but I am not sure if it's worth it at this
> stage.
>
> >
> > 03. advance_conflict_slot_xmin()
> >
> > ```
> > Assert(TransactionIdIsValid(MyReplicationSlot->data.xmin));
> > ```
> >
> > Assuming the case that the launcher crashed just after
> > ReplicationSlotCreate(CONFLICT_DETECTION_SLOT).
> > After the restart, the slot can be acquired since
> > SearchNamedReplicationSlot(CONFLICT_DETECTION_SLOT)
> > is true, but the process would fail the assert because data.xmin is still invalid.
> >
> > I think we should re-create the slot when the xmin is invalid. Thought?
>
> After thinking more, the standard approach to me would be to mark the slot as
> EPHEMERAL during creation and persist it after initializing, so changed like
> that.
>
> > 05. check_remote_recovery()
> >
> > Can we add a test case related with this?
>
> I think the code path is already tested, and I am a bit unsure if we want to setup
> a standby to test the ERROR case, so didn't add this.
>
> ---
>
> Attach the new version patch set which addressed all other comments.

Few suggestions:
1) If we have a subscription with detect_update_deleted option and we
try to upgrade it with default settings(in case dba forgot to set
track_commit_timestamp), the upgrade will fail after doing a lot of
steps like that mentioned in ok below:
Setting locale and encoding for new cluster ok
Analyzing all rows in the new cluster ok
Freezing all rows in the new cluster ok
Deleting files from new pg_xact ok
Copying old pg_xact to new server ok
Setting oldest XID for new cluster ok
Setting next transaction ID and epoch for new cluster ok
Deleting files from new pg_multixact/offsets ok
Copying old pg_multixact/offsets to new server ok
Deleting files from new pg_multixact/members ok
Copying old pg_multixact/members to new server ok
Setting next multixact ID and offset for new cluster ok
Resetting WAL archives ok
Setting frozenxid and minmxid counters in new cluster ok
Restoring global objects in the new cluster ok
Restoring database schemas in the new cluster
postgres
*failure*

We should detect this at an earlier point somewhere like in
check_new_cluster_subscription_configuration and throw an error from
there.

2) Also should we include an additional slot for the
pg_conflict_detection slot while checking max_replication_slots.
Though this error will occur after the upgrade is completed, it may be
better to include the slot during upgrade itself so that the DBA need
not handle this error separately after the upgrade is completed.

3) We have reserved the pg_conflict_detection name in this version, so
if there was a replication slot with the name pg_conflict_detection in
the older version, the upgrade will fail at a very later stage like an
earlier upgrade shown. I feel we should check if the old cluster has
any slot with the name pg_conflict_detection and throw an error
earlier itself:
+void
+ErrorOnReservedSlotName(const char *name)
+{
+ if (strcmp(name, CONFLICT_DETECTION_SLOT) == 0)
+ ereport(ERROR,
+ errcode(ERRCODE_RESERVED_NAME),
+ errmsg("replication slot name \"%s\"
is reserved",
+ name));
+}

4) We should also mention something like below in the documentation so
the user can be aware of it:
The slot name cannot be created with pg_conflict_detection, as this is
reserved for logical replication conflict detection.

Regards,
Vignesh

In response to

RE: Conflict detection for update_deleted in logical replication at 2024-12-25 02:43:02 from Zhijie Hou (Fujitsu)

Responses

RE: Conflict detection for update_deleted in logical replication at 2025-01-07 12:34:23 from Zhijie Hou (Fujitsu)

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrea Gelmini	2025-01-02 10:41:56	Re: FileFallocate misbehaving on XFS
Previous Message	Nisha Moond	2025-01-02 10:27:10	Re: Introduce XID age and inactive timeout based replication slot invalidation