Quick Links

RE: Conflict detection for update_deleted in logical replication

From:	"Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	RE: Conflict detection for update_deleted in logical replication
Date:	2025-01-22 11:53:35
Message-ID:	OS0PR01MB571665B3D4BFB1065DE6B33194E12@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Saturday, January 18, 2025 11:45 AM Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Thursday, January 16, 2025 6:02 PM Amit Kapila
> <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Jan 15, 2025 at 2:20 PM Zhijie Hou (Fujitsu)
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > In the latest version, we implemented a simpler approach that allows
> > > the apply worker to directly advance the oldest_nonremovable_xid if
> > > the waiting time exceeds the newly introduced option's limit. I've
> > > named this option "max_conflict_retention_duration," as it aligns
> > > better with the conflict detection concept and the "retain_conflict_info"
> > option.
> > >
> > > During the last phase (RCI_WAIT_FOR_LOCAL_FLUSH), the apply worker
> > > evaluates how much time it has spent waiting. If this duration
> > > exceeds the max_conflict_retention_duration, the worker directly
> > > advances the oldest_nonremovable_xid and logs a message indicating
> > > the forced advancement of the non-removable transaction ID.
> > >
> > > This approach is a bit like a time-based option that discussed before.
> > > Compared to the slot invalidation approach, this approach is simpler
> > > because we can avoid adding 1) new slot invalidation type due to
> > > apply lag, 2) new field lag_behind in shared memory
> > > (MyLogicalRepWorker) to indicate when the lag exceeds the limit, and
> > > 3) additional logic in the launcher to handle each worker's lag status.
> > >
> > > In the slot invalidation, user would be able to confirm if the
> > > current by checking if the slot in pg_replication_slot in
> > > invalidated or not, while in the simpler approach mentioned, user
> > > could only confirm that by
> > checking the LOGs.
> > >
> >
> > The user needs to check the LOGs corresponding to all subscriptions on
> > the node. I see the simplicity of the approach you used but still the
> > slot_invalidation idea sounds better to me on the grounds that it will
> > be convenient for users/DBA to know when to rely on the update_missing
> > type conflict if there is a valid and active slot with the name
> 'pg_conflict_detection'
> > (or whatever name we decide to give) then users can rely on the
> > detected conflict. Sawada-San, and others, do you have any preference on
> this matter?
>
> I think invalidating the slot is OK and we could also let the apply worker to
> automatic recovery as suggested in [1].
>
> Here is the V24 patch set. I modified 0004 patch to implement the slot
> Invalidation part. Since the automatic recovery could be an optimization and the
> discussion is in progress, I didn't implement that part.

The implementation is in progress and I will include it in next version.

Here is the V25 patch set that includes the following change:

0001

* Per off-list discussion with Amit, I added few comments to mention the
reason of skipping advancing xid when table sync is in progress and to mention
that the advancement will not be delayed if changes are filtered out on publisher
via row/table filter.

0004

* Fixed a bug that the launcher would advance the slot.xmin when some apply
workers have not yet started.

* Fixed a bug that the launcher did not advance the slot.xmin even if one of the
apply worker has stopped conflict retention due to the lag.

* Add a retain_conflict_info column in the pg_stat_subscription view to
indicate whether the apply worker is effectively retaining conflict
information. The value is set to true only if retain_conflict_info is enabled
for the associated subscription, and the retention duration for conflict
detection by the apply worker has not exceeded
max_conflict_retention_duration. Thanks Kuroda-san for contributing codes
off-list.

Best Regards,
Hou zj

Attachment	Content-Type	Size
v25-0001-Maintain-the-oldest-non-removeable-tranasction-I.patch	application/octet-stream	41.0 KB
v25-0002-Maintain-the-replication-slot-in-logical-launche.patch	application/octet-stream	21.9 KB
v25-0003-Add-a-retain_conflict_info-option-to-subscriptio.patch	application/octet-stream	80.1 KB
v25-0004-add-a-max_conflict_retention_duration-subscripti.patch	application/octet-stream	89.6 KB
v25-0005-Add-a-tap-test-to-verify-the-management-of-the-n.patch	application/octet-stream	7.0 KB
v25-0006-Support-the-conflict-detection-for-update_delete.patch	application/octet-stream	25.7 KB

In response to

RE: Conflict detection for update_deleted in logical replication at 2025-01-18 03:45:13 from Zhijie Hou (Fujitsu)

Responses

RE: Conflict detection for update_deleted in logical replication at 2025-01-23 11:47:05 from Zhijie Hou (Fujitsu)

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tatsuo Ishii	2025-01-22 11:58:18	Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options
Previous Message	Bertrand Drouvot	2025-01-22 11:34:23	Re: doc: explain pgstatindex fragmentation