Re: Conflict detection for update_deleted in logical replication

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Conflict detection for update_deleted in logical replication
Date: 2024-09-13 07:55:52
Message-ID: CAJpy0uAzzOzhXGH-zBc7Zt8ndXRf6r4OnLzgRrHyf8cvd+fpwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 13, 2024 at 11:38 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> > >
> > > So in brief, this solution is only for bidrectional setup? For non-bidirectional,
> > > feedback_slots is non-configurable and thus irrelevant.
> >
> > Right.
> >
>
> One possible idea to address the non-bidirectional case raised by
> Shveta is to use a time-based cut-off to remove dead tuples. As
> mentioned earlier in my email [1], we can define a new GUC parameter
> say vacuum_committs_age which would indicate that we will allow rows
> to be removed only if the modified time of the tuple as indicated by
> committs module is greater than the vacuum_committs_age. We could keep
> this parameter a table-level option without introducing a GUC as this
> may not apply to all tables. I checked and found that some other
> replication solutions like GoldenGate also allowed similar parameters
> (tombstone_deletes) to be specified at table level [2]. The other
> advantage of allowing it at table level is that it won't hamper the
> performance of hot-pruning or vacuum in general. Note, I am careful
> here because to decide whether to remove a dead tuple or not we need
> to compare its committs_time both during hot-pruning and vacuum.

+1 on the idea, but IIUC this value doesn’t need to be significant; it
can be limited to just a few minutes. The one which is sufficient to
handle replication delays caused by network lag or other factors,
assuming clock skew has already been addressed.

This new parameter is necessary only for cases where an UPDATE and
DELETE on the same row occur concurrently, but the replication order
to a third node is not preserved, which could result in data
divergence. Consider the following example:

Node A:
T1: INSERT INTO t (id, value) VALUES (1,1); (10.01 AM)
T2: DELETE FROM t WHERE id = 1; (10.03 AM)

Node B:
T3: UPDATE t SET value = 2 WHERE id = 1; (10.02 AM)

Assume a third node (Node C) subscribes to both Node A and Node B. The
"correct" order of messages received by Node C would be T1-T3-T2, but
it could also receive them in the order T1-T2-T3, wherein sayT3 is
received with a lag of say 2 mins. In such a scenario, T3 should be
able to recognize that the row was deleted by T2 on Node C, thereby
detecting the update-deleted conflict and skipping the apply.

The 'vacuum_committs_age' parameter should account for this lag, which
could lead to the order reversal of UPDATE and DELETE operations.

Any subsequent attempt to update the same row after conflict detection
and resolution should not pose an issue. For example, if Node A
triggers the following at 10:20 AM:
UPDATE t SET value = 3 WHERE id = 1;

Since the row has already been deleted, the UPDATE will not proceed
and therefore will not generate a replication operation on the other
nodes, indicating that vacuum need not to preserve the dead row to
this far.

thanks
Shveta

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mats Kindahl 2024-09-13 08:01:23 Re: Use streaming read API in ANALYZE
Previous Message Tender Wang 2024-09-13 07:48:20 Re: Eager aggregation, take 3