Quick Links

RE: Conflict detection for update_deleted in logical replication

From:	"Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	RE: Conflict detection for update_deleted in logical replication
Date:	2024-12-19 11:04:46
Message-ID:	OS0PR01MB5716910556C1647838D8267B94062@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sunday, December 15, 2024 9:39 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Dec 11, 2024 at 2:32 PM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > Attach the V16 patch set which addressed above comments.
> >
>
> Review Comments
> ================
> 1.
> +/*
> + * Workhorse for the RCI_WAIT_FOR_LOCAL_FLUSH phase.
> + */
> +static void
> +wait_for_local_flush(RetainConflictInfoData *data) {
> ...
> + /*
> + * Do not attempt to advance the non-removable transaction ID when
> + table
> + * sync is in progress. During this time, changes from a single
> + * transaction may be applied by multiple table sync workers
> + corresponding
> + * to the target tables. In this case, confirming the apply and flush
> + * progress across all table sync workers is complex and not worth the
> + * effort.
> + */
> + if (!AllTablesyncsReady())
> + return;
>
> How is it ensured that new tables are not added to subscription via refresh
> immediately after this check that are not yet in ready state?
> I mean if it happens immediately after this check then the problem described
> by comment can happen and we may end up advancing the non-removable-xid
> incorrectly. If this is safe, then please update comments to reflect the same.

I think it's safe because WAL positions of changes from these new tables, which
will be applied, should be greater than remote_lsn and are included in
transactions with later commit timestamps. So, we do not need to wait for these
changes to be applied in this round of advancement.

And I added above in the comments.

>
> 2.
> + /*
> + * Reset all data fields except those used to determine the timing for
> + the
> + * next round of transaction ID advancement.
> + */
> + memset(data, 0, offsetof(RetainConflictInfoData, candidate_xid_time));
>
> Wouldn't it be better to initialize each field separately? With the current code,
> adding new fields at the end of the structure RetainConflictInfoData will be
> difficult.

Changed as suggested.

>
> 3.
> + * TODO: The remote flush location (last_flushpos) is currently not
> +updated
> + * during change application, making it impossible to satisfy the
> +condition of
> + * the final phase (RCI_WAIT_FOR_LOCAL_FLUSH) for advancing the
> transaction ID.
> + * Consider updating the remote flush position in the final phase to
> +enable
> + * advancement during change application.
> + */
> +static inline bool
> +can_advance_nonremovable_xid(RetainConflictInfoData *data)
>
> I think we don't need this TODO here as there is XXX comment in
> wait_for_local_flush() which has the same information.

Removed.

>
> 4.
> @@ -2314,6 +2316,10 @@ ProcessStandbyMessage(void)
> ProcessStandbyHSFeedbackMessage();
> break;
>
> + case 'S':
> + ProcessStandbyPSRequestMessage();
> + break;
>
> Why do we use the capital message name 'S' when other messages in
> ProcessStandbyMessage() are all small cases? I see that walsender already
> handles 'S' message in HandleUploadManifestPacket() though it won't conflict
> with this case. But still, shouldn't we use a different message here?

OK, I chose a new name 'p' in the latest version.

>
> 5. The apply worker needs to at least twice get the publisher status message to
> advance oldest_nonremovable_xid once. It then uses the remote_lsn of the last
> such message to ensure that it has been applied locally. Such a remote_lsn
> could be a much later value than required leading to delay in advancing
> oldest_nonremovable_xid. How about if while first time processing the
> publisher_status message on walsender, we get the
> latest_transaction_in_commit by having a function
> GetLatestTransactionIdInCommit() instead of
> GetOldestTransactionIdInCommit() and then simply wait till that proc has
> written commit WAL (aka wait till it clears DELAY_CHKPT_IN_COMMIT)?
> Then get the latest LSN wrote and send that to apply worker waiting for the
> publisher_status message. If this is feasible then we should be able to
> advance oldest_nonremovable_xid with just one publisher_status message.
> Won't that be an improvement over current? If so, we can even further try to
> improve it by just using commit_LSN of the transaction returned by
> GetLatestTransactionIdInCommit(). One idea is that we can try to use
> MyProc->waitLSN which we are using in synchronous replication for our
> purpose. See SyncRepWaitForLSN.

I will do more performance tests on this and address if it improves
the performance.

>
> 6. Attached, a few minor comment updates.

Thanks. I have merged them.

Attach the V17 patch set which addressed above comments.

Best Regards,
Hou zj

Attachment	Content-Type	Size
v17-0006-Add-a-tap-test-to-verify-the-new-slot-xmin-mecha.patch	application/octet-stream	7.5 KB
v17-0002-Dynamically-adjust-xid-advancement-interval.patch	application/octet-stream	4.8 KB
v17-0003-Maintain-the-replication-slot-in-logical-launche.patch	application/octet-stream	11.9 KB
v17-0004-Support-the-conflict-detection-for-update_delete.patch	application/octet-stream	21.6 KB
v17-0005-Add-a-detect_update_deleted-option-to-subscripti.patch	application/octet-stream	73.3 KB
v17-0001-Maintain-the-oldest-non-removeable-tranasction-I.patch	application/octet-stream	35.0 KB

In response to

Re: Conflict detection for update_deleted in logical replication at 2024-12-15 01:38:44 from Amit Kapila

Responses

Re: Conflict detection for update_deleted in logical replication at 2024-12-20 07:11:45 from Nisha Moond
RE: Conflict detection for update_deleted in logical replication at 2024-12-23 06:14:49 from Hayato Kuroda (Fujitsu)
Re: Conflict detection for update_deleted in logical replication at 2025-01-01 05:36:15 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	jian he	2024-12-19 12:02:20	speedup COPY TO for partitioned table.
Previous Message	Amit Kapila	2024-12-19 10:56:34	Re: Skip collecting decoded changes of already-aborted transactions