From: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
---|---|
To: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
Cc: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Subject: | RE: Conflict detection for update_deleted in logical replication |
Date: | 2024-10-14 03:39:49 |
Message-ID: | OS0PR01MB5716573AA71F1F8E547431AA94442@OS0PR01MB5716.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Friday, October 11, 2024 4:35 PM Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> Attach the V4 patch set which addressed above comments.
>
While reviewing the patch, I noticed that the current design could not work in
a non-bidirectional cluster (publisher -> subscriber) when the publisher is
also a physical standby. (We supported logical decoding on a physical standby
recently, so it's possible to take a physical standby as a logical publisher).
The cluster looks like:
physical primary -> physical standby (also publisher) -> logical subscriber (detect_update_deleted)
The issue arises because the physical standby (acting as the publisher) might
lag behind its primary. As a result, the logical walsender on the standby might
not be able to get the latest WAL position when requested by the logical
subscriber. We can only get the WAL replay position but there may be more WALs
that are being replicated from the primary and those WALs could have older
commit timestamp. (Note that transactions on both primary and standby have
the same commit timestamp).
So, the logical walsender might send an outdated WAL position as feedback.
This, in turn, can cause the replication slot on the subscriber to advance
prematurely, leading to the unwanted removal of dead tuples. As a result, the
apply worker may fail to correctly detect update-delete conflicts.
We thought of few options to fix this:
1) Add a Time-Based Subscription Option:
We could add a new time-based option for subscriptions, such as
retain_dead_tuples = '5s'. In the logical launcher, after obtaining the
candidate XID, the launcher will wait for the specified time before advancing
the slot.xmin. This ensures that deleted tuples are retained for at least the
duration defined by this new option.
This approach is designed to simulate the functionality of the GUC
(vacuum_committs_age), but with a simpler implementation that does not impact
vacuum performance. We can maintain both this time-based method and the current
automatic method. If a user does not specify the time-based option, we will
continue using the existing approach to retain dead tuples until all concurrent
transactions from the remote node have been applied.
2) Modification to the Logical Walsender
On the logical walsender, which is as a physical standby, we can build an
additional connection to the physical primary to obtain the latest WAL
position. This position will then be sent as feedback to the logical
subscriber.
A potential concern is that this requires the walsender to use the walreceiver
API, which may seem a bit unnatural. And, it starts an additional walsender
process on the primary, as the logical walsender on the physical standby will
need to communicate with this walsender to fetch the WAL position.
3) Documentation of Restrictions
As an alternative, we could simply document the restriction that detecting
update_delete is not supported if the publisher is also acting as a physical
standby.
Best Regards,
Hou zj
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2024-10-14 03:50:58 | Re: Better error reporting from extension scripts (Was: Extend ALTER OPERATOR) |
Previous Message | jian he | 2024-10-14 03:38:47 | Re: Better error reporting from extension scripts (Was: Extend ALTER OPERATOR) |