Re: Conflict detection for update_deleted in logical replication

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Conflict detection for update_deleted in logical replication
Date: 2024-09-24 17:24:56
Message-ID: CAD21AoD=m-YHceYMpsdu0HnGCaezeyVhaCPFxDLHU7aN0wgzqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 24, 2024 at 12:14 AM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Tuesday, September 24, 2024 2:42 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Mon, Sep 23, 2024 at 8:32 PM Zhijie Hou (Fujitsu)
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > On Tuesday, September 24, 2024 5:05 AM Masahiko Sawada
> > <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > I'm still studying this idea but let me confirm the following scenario.
> > > >
> > > > Suppose both Node-A and Node-B have the same row (1,1) in table t,
> > > > and XIDs and commit LSNs of T2 and T3 are the following:
> > > >
> > > > Node A
> > > > T2: DELETE FROM t WHERE id = 1 (10:02 AM) XID:100,
> > commit-LSN:1000
> > > >
> > > > Node B
> > > > T3: UPDATE t SET value = 2 WHERE id 1 (10:01 AM) XID:500,
> > > > commit-LSN:5000
> > > >
> > > > Further suppose that it's now 10:05 AM, and the latest XID and the
> > > > latest flush WAL position of Node-A and Node-B are following:
> > > >
> > > > Node A
> > > > current XID: 300
> > > > latest flush LSN; 3000
> > > >
> > > > Node B
> > > > current XID: 700
> > > > latest flush LSN: 7000
> > > >
> > > > Both T2 and T3 are NOT sent to Node B and Node A yet, respectively
> > > > (i.e., the logical replication is delaying for 5 min).
> > > >
> > > > Consider the following scenario:
> > > >
> > > > 1. The apply worker on Node-A calls GetRunningTransactionData() and
> > > > gets 301 (set as candidate_xmin).
> > > > 2. The apply worker on Node-A requests the latest WAL flush position
> > > > from Node-B, and gets 7000 (set as candidate_remote_wal_lsn).
> > > > 3. T2 is applied on Node-B, and the latest flush position of Node-B is now
> > 8000.
> > > > 4. The apply worker on Node-A continues applying changes, and
> > > > applies the transactions up to remote (commit) LSN 7100.
> > > > 5. Now that the apply worker on Node-A applied all changes smaller
> > > > than candidate_remote_wal_lsn (7000), it increases the slot.xmin to
> > > > 301 (candidate_xmin).
> > > > 6. On Node-A, vacuum runs and physically removes the tuple that was
> > > > deleted by T2.
> > > >
> > > > Here, on Node-B, there might be a transition between LSN 7100 and
> > > > 8000 that might require the tuple that is deleted by T2.
> > > >
> > > > For example, "UPDATE t SET value = 3 WHERE id = 1" (say T4) is
> > > > executed on Node-B at LSN 7200, and it's sent to Node-A after step 6.
> > > > On Node-A, whether we detect "update_deleted" or "update_missing"
> > > > still depends on when vacuum removes the tuple deleted by T2.
> > >
> > > I think in this case, no matter we detect "update_delete" or
> > > "update_missing", the final data is the same. Because T4's commit
> > > timestamp should be later than
> > > T2 on node A, so in the case of "update_deleted", it will compare the
> > > commit timestamp of the deleted tuple's xmax with T4's timestamp, and
> > > T4 should win, which means we will convert the update into insert and
> > > apply. Even if the deleted tuple is deleted and "update_missing" is
> > > detected, the update will still be converted into insert and applied. So, the
> > result is the same.
> >
> > The "latest_timestamp_wins" is the default resolution method for
> > "update_deleted"? When I checked the wiki page[1], the "skip" was the default
> > solution method for that.
>
> Right, I think the wiki needs some update.
>
> I think using 'skip' as default for update_delete could easily cause data
> divergence when the dead tuple is deleted by an old transaction while the
> UPDATE has a newer timestamp like the case you mentioned. It's necessary to
> follow the last update win strategy when the incoming update has later
> timestamp, which is to convert update to insert.

Right. If "latest_timestamp_wins" is the default resolution for
"update_deleted", I think your idea works fine unless I'm missing
corner cases.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-09-24 17:57:28 Re: pgsql: Improve default and empty privilege outputs in psql.
Previous Message Fujii Masao 2024-09-24 17:22:52 Re: Add new COPY option REJECT_LIMIT