RE: Conflict detection for update_deleted in logical replication

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Conflict detection for update_deleted in logical replication
Date: 2024-11-14 02:54:29
Message-ID: OS3PR01MB5718888648059FEDB3BF18F3945B2@OS3PR01MB5718.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, November 13, 2024 8:35 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
>
> I've started to review these patch series. I've reviewed only 0001 patch for now
> but let me share some comments:

Thanks for the comment !

>
> ---
> + if (*phase == DTR_WAIT_FOR_WALSENDER_WAL_POS)
> + {
> + Assert(xid_advance_attempt_time);
>
> What is this assertion for? If we want to check here that we have sent a request
> message for the publisher, I think it's clearer if we have
> "Assert(xid_advance_attempt_time > 0)". I'm not sure we really need this
> assertion though since it's never false once we set xid_advance_attempt_time.

I agree that the assert is not useful, so removed in this version.

>
> ---
> + /*
> + * Do not return here because the apply worker might
> have already
> + * applied all changes up to remote_lsn. Instead,
> proceed to the
> + * next phase to check if we can immediately advance
> the transaction
> + * ID.
> + */
> + *phase = DTR_WAIT_FOR_LOCAL_FLUSH;
> + }
>
> If we always proceed to the next phase, is this phase really necessary? IIUC
> even if we jump to DTR_WAIT_FOR_LOCAL_FLUSH phase after
> DTR_REQUEST_WALSENDER_WAL_POS and have a check if we received the
> remote WAL position in DTR_WAIT_FOR_LOCAL_FLUSH phase, it would work
> fine.

Agreed. I removed this separate phase.

Although the DeadTupleRetainPhase enum only has 2 items now, I didn't remove
the enum in this version, as it could make the code easier to read and we may extent
the phases later if we accept the approach in 0001_2.

>
> ---
> + /*
> + * Reaching here means the remote WAL position has
> been received, and
> + * all transactions up to that position on the
> publisher have been
> + * applied and flushed locally. So, now we can advance the
> + * non-removable transaction ID.
> + */
> + SpinLockAcquire(&MyLogicalRepWorker->relmutex);
> + MyLogicalRepWorker->oldest_nonremovable_xid =
> candidate_xid;
> + SpinLockRelease(&MyLogicalRepWorker->relmutex);
>
> How about adding a debug log message showing new
> oldest_nonremovable_xid and related LSN for making the debug/investigation
> easier? For example,
>
> elog(LOG, "confirmed remote flush up to %X/%X: new
> oldest_nonremovable_xid %u",
> LSN_FORMAT_ARGS(*remote_lsn),
> XidFromFullTransactionId(candidate_xid));

Added.

>
> ---
> + /*
> + * Exit early if the user has disabled sending messages to the
> + * publisher.
> + */
> + if (wal_receiver_status_interval <= 0)
> + return;
>
> In send_feedback(), we send a feedback message if the publisher requests,
> even if wal_receiver_status_interval is 0. On the other hand, the above codes
> mean that we don't send a WAL position request and therefore never update
> oldest_nonremovable_xid if wal_receiver_status_interval is 0. I'm concerned it
> could be a pitfall for users.

It was intended to mirror the style of hot standby feedback (as implemented in
XLogWalRcvSendHSFeedback()) where the message is sent at most once per
wal_receiver_status_interval. I think the usage of the new message in the patch
is more similar to the hot standby feedback which are used to advance xmin of a
replication slot.

I also documented this risk in the detect_update_deleted option patch(0004).
Would it be sufficient to you ?

>
> ---
> % git show | grep update_delete
> This set of patches aims to support the detection of update_deleted
> conflicts,
> transactions with earlier timestamps than the DELETE, detecting
> update_delete
> We assume that the appropriate resolution for update_deleted conflicts, to
> that when detecting the update_deleted conflict, and the remote update
> has a
> + * to detect update_deleted conflicts.
> + * update_deleted is necessary, as the UPDATEs in remote transactions
> should be
> + * to allow for the detection of update_delete conflicts when
> + applying
>
> There are mixed 'update_delete' and 'update_deleted' in the commit message
> and the codes. I think it's better to use 'update_deleted'.

Thanks for searching. I have replaced them to update_deleted.

Attach the V9 patch set which addressed above comments.

Best Regards,
Hou zj

Attachment Content-Type Size
v9-0004-Add-a-detect_update_deleted-option-to-subscriptio.patch application/octet-stream 65.2 KB
v9-0001-Maintain-the-oldest-non-removeable-tranasction-ID.patch application/octet-stream 19.9 KB
v9-0001_2-Maintain-the-oldest-non-removeable-tranasction-I.patch.txt text/plain 24.7 KB
v9-0005-Add-a-tap-test-to-verify-the-new-slot-xmin-mechan.patch application/octet-stream 6.9 KB
v9-0002-Maintain-the-replication-slot-in-logical-launcher.patch application/octet-stream 11.4 KB
v9-0003-Support-the-conflict-detection-for-update_deleted.patch application/octet-stream 21.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2024-11-14 02:56:45 RE: Fix for pageinspect bug in PG 17
Previous Message jian he 2024-11-14 02:21:16 Re: altering a column's collation leaves an invalid foreign key