Re: Conflict detection for update_deleted in logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Conflict detection for update_deleted in logical replication
Date: 2025-01-02 06:30:13
Message-ID: CAA4eK1+XfGqYRJPFk0wUHAh3mkJ59Tj2q+gXctyk7SKiASHgFA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 25, 2024 at 8:13 AM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Monday, December 23, 2024 2:15 PM Kuroda, Hayato/黒田 隼人 <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> >
> > Dear Hou,
> >
> > Thanks for updating the patch. Few comments:
>
> Thanks for the comments!
>
> > 02. ErrorOnReservedSlotName()
> >
> > Currently the function is callsed from three points -
> > create_physical_replication_slot(),
> > create_logical_replication_slot() and CreateReplicationSlot().
> > Can we move them to the ReplicationSlotCreate(), or combine into
> > ReplicationSlotValidateName()?
>
> I am not sure because moving the check into these functions because that would
> prevent the launcher from creating the slot as well unless we add a new
> parameter for these functions, but I am not sure if it's worth it at this
> stage.
>

But why would it prevent the launcher from creating the slot? I think
we should add this check in the function
ReplicationSlotValidateName(). Another related point:

+ErrorOnReservedSlotName(const char *name)
+{
+ if (strcmp(name, CONFLICT_DETECTION_SLOT) == 0)
+ ereport(ERROR,
+ errcode(ERRCODE_RESERVED_NAME),
+ errmsg("replication slot name \"%s\" is reserved",
+ name));

Won't it be sufficient to check using an existing IsReservedName()?
Even, if not, then also we should keep that as part of the check
similar to what we are doing in pg_replication_origin_create().

> >
> > 03. advance_conflict_slot_xmin()
> >
> > ```
> > Assert(TransactionIdIsValid(MyReplicationSlot->data.xmin));
> > ```
> >
> > Assuming the case that the launcher crashed just after
> > ReplicationSlotCreate(CONFLICT_DETECTION_SLOT).
> > After the restart, the slot can be acquired since
> > SearchNamedReplicationSlot(CONFLICT_DETECTION_SLOT)
> > is true, but the process would fail the assert because data.xmin is still invalid.
> >
> > I think we should re-create the slot when the xmin is invalid. Thought?
>
> After thinking more, the standard approach to me would be to mark the slot as
> EPHEMERAL during creation and persist it after initializing, so changed like
> that.
>

Sounds reasonable but OTOH, all other places that create physical
slots (which we are doing here) don't use this trick. So, don't they
need similar reliability? Also, add some comments as to why we are
initially creating the RS_EPHEMERAL slot as we have at other places.

Few other comments on 0003
=======================
1.
+ if (sublist)
+ {
+ bool updated;
+
+ if (!can_advance_xmin)
+ xmin = InvalidFullTransactionId;
+
+ updated = advance_conflict_slot_xmin(xmin);

How will it help to try advancing slot_xmin when xmin is invalid?

2.
@@ -1167,14 +1181,43 @@ ApplyLauncherMain(Datum main_arg)
long elapsed;

if (!sub->enabled)
+ {
+ can_advance_xmin = false;

In ApplyLauncherMain(), if one of the subscriptions is disabled (say
the last one in sublist), then can_advance_xmin will become false in
the above code. Now, later, as quoted in comment-1, the patch
overrides xmin to InvalidFullTransactionId if can_advance_xmin is
false. Won't that lead to the wrong computation of xmin?

3.
+ slot_maybe_exist = true;
+ }
+
+ /*
+ * Drop the slot if we're no longer retaining dead tuples.
+ */
+ else if (slot_maybe_exist)
+ {
+ drop_conflict_slot_if_exists();
+ slot_maybe_exist = false;

Can't we use MyReplicationSlot instead of introducing a new boolean
slot_maybe_exist?

In any case, how does the above code deal with the case where the
launcher is restarted for some reason and there is no subscription
after that? Will it be possible to drop the slot in that case?

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2025-01-02 06:46:06 Re: Re: proposal: schema variables
Previous Message Gurjeet Singh 2025-01-02 06:16:21 Re: Document How Commit Handles Aborted Transactions