RE: Conflict detection for update_deleted in logical replication

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Conflict detection for update_deleted in logical replication
Date: 2024-12-25 02:43:02
Message-ID: OS0PR01MB5716E0EC8DC4F06C390E7418940C2@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Monday, December 23, 2024 2:15 PM Kuroda, Hayato/黒田 隼人 <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Hou,
>
> Thanks for updating the patch. Few comments:

Thanks for the comments!

> 02. ErrorOnReservedSlotName()
>
> Currently the function is callsed from three points -
> create_physical_replication_slot(),
> create_logical_replication_slot() and CreateReplicationSlot().
> Can we move them to the ReplicationSlotCreate(), or combine into
> ReplicationSlotValidateName()?

I am not sure because moving the check into these functions because that would
prevent the launcher from creating the slot as well unless we add a new
parameter for these functions, but I am not sure if it's worth it at this
stage.

>
> 03. advance_conflict_slot_xmin()
>
> ```
> Assert(TransactionIdIsValid(MyReplicationSlot->data.xmin));
> ```
>
> Assuming the case that the launcher crashed just after
> ReplicationSlotCreate(CONFLICT_DETECTION_SLOT).
> After the restart, the slot can be acquired since
> SearchNamedReplicationSlot(CONFLICT_DETECTION_SLOT)
> is true, but the process would fail the assert because data.xmin is still invalid.
>
> I think we should re-create the slot when the xmin is invalid. Thought?

After thinking more, the standard approach to me would be to mark the slot as
EPHEMERAL during creation and persist it after initializing, so changed like
that.

> 05. check_remote_recovery()
>
> Can we add a test case related with this?

I think the code path is already tested, and I am a bit unsure if we want to setup
a standby to test the ERROR case, so didn't add this.

---

Attach the new version patch set which addressed all other comments.

Based on some off-list discussions with Sawada-san and Amit, it would be better
if the apply worker can avoid reporting an ERROR if the publisher's clock's
lags behind that of the subscriber, so I implemented a new 0007 patch to allow
the apply worker to wait for the clock skew to pass and then send a new request
to the publisher for the latest status. The implementation is as follows:

Since we have the time (reply_time) on the walsender when it confirms that all
the committing transactions have finished, it means any subsequent transactions
on the publisher should be assigned a commit timestamp later then reply_time.
And the (candidate_xid_time) when it determines the oldest active xid. Any old
transactions on the publisher that have finished should have a commit timestamp
earlier than the candidate_xid_time.

The apply worker can compare the candidate_xid_time with reply_time. If
candidate_xid_time is less than the reply_time, then it's OK to advance the xid
immdidately. If candidate_xid_time is greater than reply_time, it means the
clock of publisher is behind that of the subscriber, so the apply worker can
wait for the skew to pass before advancing the xid.

Since this is considered as an improvement, we can focus on this after
pushing the main patches.

Best Regards,
Hou zj

Attachment Content-Type Size
v18-0007-Wait-for-publisher-s-clock-to-catch-up-before-pr.patch application/octet-stream 8.5 KB
v18-0001-Maintain-the-oldest-non-removeable-tranasction-I.patch application/octet-stream 35.0 KB
v18-0002-Dynamically-adjust-xid-advancement-interval.patch application/octet-stream 4.8 KB
v18-0003-Maintain-the-replication-slot-in-logical-launche.patch application/octet-stream 11.9 KB
v18-0004-Support-the-conflict-detection-for-update_delete.patch application/octet-stream 21.6 KB
v18-0005-Add-a-detect_update_deleted-option-to-subscripti.patch application/octet-stream 73.8 KB
v18-0006-Add-a-tap-test-to-verify-the-new-slot-xmin-mecha.patch application/octet-stream 7.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-12-25 03:00:59 Re: Infinite loop in XLogPageRead() on standby
Previous Message Andrei Lepikhov 2024-12-25 02:33:57 Re: ERROR: corrupt MVNDistinct entry