Re: Conflict detection for update_deleted in logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Conflict detection for update_deleted in logical replication
Date: 2024-10-18 09:44:54
Message-ID: CAA4eK1LKgkjyNKeW5jEhy1=uE8z0p7Pdae0rohoJP51eJGd=gg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 15, 2024 at 5:03 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Oct 14, 2024 at 9:09 AM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > We thought of few options to fix this:
> >
> > 1) Add a Time-Based Subscription Option:
> >
> > We could add a new time-based option for subscriptions, such as
> > retain_dead_tuples = '5s'. In the logical launcher, after obtaining the
> > candidate XID, the launcher will wait for the specified time before advancing
> > the slot.xmin. This ensures that deleted tuples are retained for at least the
> > duration defined by this new option.
> >
> > This approach is designed to simulate the functionality of the GUC
> > (vacuum_committs_age), but with a simpler implementation that does not impact
> > vacuum performance. We can maintain both this time-based method and the current
> > automatic method. If a user does not specify the time-based option, we will
> > continue using the existing approach to retain dead tuples until all concurrent
> > transactions from the remote node have been applied.
> >
> > 2) Modification to the Logical Walsender
> >
> > On the logical walsender, which is as a physical standby, we can build an
> > additional connection to the physical primary to obtain the latest WAL
> > position. This position will then be sent as feedback to the logical
> > subscriber.
> >
> > A potential concern is that this requires the walsender to use the walreceiver
> > API, which may seem a bit unnatural. And, it starts an additional walsender
> > process on the primary, as the logical walsender on the physical standby will
> > need to communicate with this walsender to fetch the WAL position.
> >
>
> This idea is worth considering, but I think it may not be a good
> approach if the physical standby is cascading. We need to restrict the
> update_delete conflict detection, if the standby is cascading, right?
>
> The other approach is that we send current_timestamp from the
> subscriber and somehow check if the physical standby has applied
> commit_lsn up to that commit_ts, if so, it can send that WAL position
> to the subscriber, otherwise, wait for it to be applied. If we do this
> then we don't need to add a restriction for cascaded physical standby.
> I think the subscriber anyway needs to wait for such an LSN to be
> applied on standby before advancing the xmin even if we get it from
> the primary. This is because the subscriber can only be able to apply
> and flush the WAL once it is applied on the standby. Am, I missing
> something?
>
> This approach has a disadvantage that we are relying on clocks to be
> synced on both nodes which we anyway need for conflict resolution as
> discussed in the thread [1]. We also need to consider the Commit
> Timestamp and LSN inversion issue as discussed in another thread [2]
> if we want to pursue this approach because we may miss an LSN that has
> a prior timestamp.
>

The problem due to Commit Timestamp and LSN inversion is that the
standby may not consider the WAL LSN from an earlier timestamp, which
could lead to the removal of required dead rows on the subscriber.

The other problem pointed out by Hou-San offlist due to Commit
Timestamp and LSN inversion is that we could miss sending the WAL LSN
that the subscriber requires to retain dead rows for update_delete
conflict. For example, consider the following case 2 node,
bidirectional setup:

Node A:
T1: INSERT INTO t (id, value) VALUES (1,1); ts=10.00 AM
T2: DELETE FROM t WHERE id = 1; ts=10.02 AM

Node B:
T3: UPDATE t SET value = 2 WHERE id = 1; ts=10.01 AM

Say subscription is created with retain_dead_tuples = true/false

After executing T2, the apply worker on Node A will check the latest
wal flush location on Node B. Till that time, the T3 should have
finished, so the xmin will be advanced only after applying the WALs
that is later than T3. So, the dead tuple will not be removed before
applying the T3, which means the update_delete can be detected.

As there is a gap between when we acquire the commit_timestamp and the
commit LSN, it is possible that T3 would have not yet flushed it's WAL
even though it is committed earlier than T2. If this happens then we
won't be able to detect update_deleted conflict reliably.

Now, the one simpler idea is to acquire the commit timestamp and
reserve WAL (LSN) under the same spinlock in
ReserveXLogInsertLocation() but that could be costly as discussed in
the thread [1]. The other more localized solution is to acquire a
timestamp after reserving the commit WAL LSN outside the lock which
will solve this particular problem.

[1] - https://www.postgresql.org/message-id/CAJpy0uBxEJnabEp3JS%3Dn9X19Vx2ZK3k5AR7N0h-cSMtOwYV3fA%40mail.gmail.com

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-10-18 09:46:34 Re: Conflict detection for update_deleted in logical replication
Previous Message Peter Eisentraut 2024-10-18 09:41:02 Re: replace strtok()