RE: Conflict detection for update_deleted in logical replication

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Conflict detection for update_deleted in logical replication
Date: 2025-04-24 12:41:15
Message-ID: TYAPR01MB572428A7A4A875C069567B8E94852@TYAPR01MB5724.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 17, 2025 at 12:19 PM shveta malik wrote:

>
> On Wed, Apr 16, 2025 at 10:30 AM shveta malik <shveta(dot)malik(at)gmail(dot)com>
> wrote:
> >
> > On Wed, Mar 26, 2025 at 4:17 PM Zhijie Hou (Fujitsu)
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > Here's a rebased version of the patch series.
> > >
> >

>
> Thanks Hou-San for the patches. I am going through this long thread and
> patches. One doubt I have is whenever there is a chance of conflict-slot update
> (either xmin or possibility of its invalidation), apply-worker gives a wake-up call
> to the launcher (ApplyLauncherWakeup). Shouldn't that suffice to wake-up
> launcher irrespective of its nap-time? Do we actually need to introduce
> MIN/MAX_NAPTIME_PER_SLOT_UPDATE in the launcher and the logic
> around it?

Thanks for reviewing. After rethinking, I agree that the wakeup is
sufficient, so I removed the nap-time logic in this version.

> Few comments for patch004:
> Config.sgml:
> 1)
> + <para>
> + Maximum duration (in milliseconds) for which conflict
> + information can be retained for conflict detection by the apply worker.
> + The default value is <literal>0</literal>, indicating that conflict
> + information is retained until it is no longer needed for detection
> + purposes.
> + </para>
>
> IIUC, the above is not entirely accurate. Suppose the subscriber manages to
> catch up and sets oldest_nonremovable_xid to 100, which is then updated in
> slot. After this, the apply worker takes a nap and begins a new xid update cycle.
> Now, let’s say the next candidate_xid is 200, but this time the subscriber fails
> to keep up and exceeds max_conflict_retention_duration. As a result, it sets
> oldest_nonremovable_xid to InvalidTransactionId, and the launcher skips
> updating the slot’s xmin.

If the time exceeds the max_conflict_retention_duration, the launcher would
Invalidate the slot, instead of skipping updating it. So the conflict info(e.g.,
dead tuples) would not be retained anymore.

> However, the previous xmin value (100) is still there
> in the slot, causing its data to be retained beyond the
> max_conflict_retention_duration. The xid 200 which actually honors
> max_conflict_retention_duration was never marked for retention. If my
> understanding is correct, then the documentation doesn’t fully capture this
> scenario.

As mentioned above, the strategy here is to invalidate the slot.

>
> 2)
> + The replication slot
> + <quote><literal>pg_conflict_detection</literal></quote> that
> used to
> + retain conflict information will be invalidated if all apply workers
> + associated with the subscription, where
>
> Subscription --> subscriptions
>
> 3)
> Name stop_conflict_retention in MyLogicalRepWorker is confusing. Shall it be
> stop_conflict_info_retention?

Changed.

Here is V30 patch set includes the following changes:

* Addressed above comments.
* Added the retention timeout check in wait_for_local_flush(), as suggested by Nisha[1].
* Improved upgrade codes and added a test for upgrade of retain_conflict_info option,
as suggested by Kuroda-san[2].

[1] https://www.postgresql.org/message-id/CABdArM4Ft8q3dZv4Bqw%3DrbS5_LFMXDJMRr3vC8a_KMCX1qatpg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/OSCPR01MB14966269726272F2F2B2BD3B0F5B22%40OSCPR01MB14966.jpnprd01.prod.outlook.com

Best Regards,
Hou zj

Attachment Content-Type Size
v30-0001-Maintain-the-oldest-non-removeable-tranasction-I.patch application/octet-stream 40.9 KB
v30-0002-Maintain-the-replication-slot-in-logical-launche.patch application/octet-stream 20.2 KB
v30-0003-Add-a-retain_conflict_info-option-to-subscriptio.patch application/octet-stream 83.4 KB
v30-0004-Introduce-a-new-GUC-max_conflict_retention_durat.patch application/octet-stream 29.0 KB
v30-0005-Re-create-the-replication-slot-if-the-conflict-r.patch application/octet-stream 10.7 KB
v30-0006-Add-a-tap-test-to-verify-the-management-of-the-n.patch application/octet-stream 6.7 KB
v30-0007-Support-the-conflict-detection-for-update_delete.patch application/octet-stream 23.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message torikoshia 2025-04-24 12:48:57 Re: RFC: Logging plan of the running query
Previous Message Bruce Momjian 2025-04-24 12:37:56 Re: pg_upgrade-breaking release