Re: Conflict detection for update_deleted in logical replication

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Conflict detection for update_deleted in logical replication
Date: 2024-09-19 18:49:18
Message-ID: CAD21AoA9TTLqFbTAbHsDvt1G2+R4iNixweVRkCHRBnHAOvoFkQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 17, 2024 at 9:29 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Sep 17, 2024 at 11:24 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Mon, Sep 16, 2024 at 11:53 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Sep 17, 2024 at 6:08 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > I haven't thought about the implementation details yet but I think
> > > during pruning (for example in heap_prune_satisfies_vacuum()), apart
> > > from checking if the tuple satisfies
> > > HeapTupleSatisfiesVacuumHorizon(), we should also check if the tuple's
> > > committs is greater than configured vacuum_committs_age (for the
> > > table) to decide whether tuple can be removed.
> >
> > Sounds very costly. I think we need to do performance tests. Even if
> > the vacuum gets slower only on the particular table having the
> > vacuum_committs_age setting, it would affect overall autovacuum
> > performance. Also, it would affect HOT pruning performance.
> >
>
> Agreed that we should do some performance testing and additionally
> think of any better way to implement. I think the cost won't be much
> if the tuples to be removed are from a single transaction because the
> required commit_ts information would be cached but when the tuples are
> from different transactions, we could see a noticeable impact. We need
> to test to say anything concrete on this.

Agreed.

>
> > >
> > > > > but IIUC this value doesn’t need to be significant; it
> > > > > can be limited to just a few minutes. The one which is sufficient to
> > > > > handle replication delays caused by network lag or other factors,
> > > > > assuming clock skew has already been addressed.
> > > >
> > > > I think that in a non-bidirectional case the value could need to be a
> > > > large number. Is that right?
> > > >
> > >
> > > As per my understanding, even for non-bidirectional cases, the value
> > > should be small. For example, in the case, pointed out by Shveta [1],
> > > where the updates from 2 nodes are received by a third node, this
> > > setting is expected to be small. This setting primarily deals with
> > > concurrent transactions on multiple nodes, so it should be small but I
> > > could be missing something.
> > >
> >
> > I might be missing something but the scenario I was thinking of is
> > something below.
> >
> > Suppose that we setup uni-directional logical replication between Node
> > A and Node B (e.g., Node A -> Node B) and both nodes have the same row
> > with key = 1:
> >
> > Node A:
> > T1: UPDATE t SET val = 2 WHERE key = 1; (10:00 AM)
> > -> This change is applied on Node B at 10:01 AM.
> >
> > Node B:
> > T2: DELETE FROM t WHERE key = 1; (05:00 AM)
> >
> > If a vacuum runs on Node B at 06:00 AM, the change of T1 coming from
> > Node A would raise an "update_missing" conflict. On the other hand, if
> > a vacuum runs on Node B at 11:00 AM, the change would raise an
> > "update_deleted" conflict. It looks whether we detect an
> > "update_deleted" or an "updated_missing" depends on the timing of
> > vacuum, and to avoid such a situation, we would need to set
> > vacuum_committs_age to more than 5 hours.
> >
>
> Yeah, in this case, it would detect a different conflict (if we don't
> set vacuum_committs_age to greater than 5 hours) but as per my
> understanding, the primary purpose of conflict detection and
> resolution is to avoid data inconsistency in a bi-directional setup.
> Assume, in the above case it is a bi-directional setup, then we want
> to have the same data in both nodes. Now, if there are other cases
> like the one you mentioned that require to detect the conflict
> reliably than I agree this value could be large and probably not the
> best way to achieve it. I think we can mention in the docs that the
> primary purpose of this is to achieve data consistency among
> bi-directional kind of setups.
>
> Having said that even in the above case, the result should be the same
> whether the vacuum has removed the row or not. Say, if the vacuum has
> not yet removed the row (due to vacuum_committs_age or otherwise) then
> also because the incoming update has a later timestamp, we will
> convert the update to insert as per last_update_wins resolution
> method, so the conflict will be considered as update_missing. And,
> say, the vacuum has removed the row and the conflict detected is
> update_missing, then also we will convert the update to insert. In
> short, if UPDATE has lower commit-ts, DELETE should win and if UPDATE
> has higher commit-ts, UPDATE should win.
>
> So, we can expect data consistency in bidirectional cases and expect a
> deterministic behavior in other cases (e.g. the final data in a table
> does not depend on the order of applying the transactions from other
> nodes).

Agreed.

I think that such a time-based configuration parameter would be a
reasonable solution. The current concerns are that it might affect
vacuum performance and lead to a similar bug we had with
vacuum_defer_cleanup_age.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2024-09-19 19:22:46 Re: Adding skip scan (including MDAM style range skip scan) to nbtree
Previous Message Nathan Bossart 2024-09-19 18:36:36 Re: Large expressions in indexes can't be stored (non-TOASTable)