Re: Conflict Detection and Resolution

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>
Subject: Re: Conflict Detection and Resolution
Date: 2024-06-07 12:48:55
Message-ID: 0cb26238-204a-4375-9647-e8b49dc821d6@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6/3/24 09:30, Amit Kapila wrote:
> On Sat, May 25, 2024 at 2:39 AM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>
>> On 5/23/24 08:36, shveta malik wrote:
>>>
>>> Conflict Resolution
>>> ----------------
>>> a) latest_timestamp_wins: The change with later commit timestamp wins.
>>> b) earliest_timestamp_wins: The change with earlier commit timestamp wins.
>>> c) apply: Always apply the remote change.
>>> d) skip: Remote change is skipped.
>>> e) error: Error out on conflict. Replication is stopped, manual
>>> action is needed.
>>>
>>
>> Why not to have some support for user-defined conflict resolution
>> methods, allowing to do more complex stuff (e.g. merging the rows in
>> some way, perhaps even with datatype-specific behavior)?
>>
>>> The change will be converted to 'UPDATE' and applied if the decision
>>> is in favor of applying remote change.
>>>
>>> It is important to have commit timestamp info available on subscriber
>>> when latest_timestamp_wins or earliest_timestamp_wins method is chosen
>>> as resolution method. Thus ‘track_commit_timestamp’ must be enabled
>>> on subscriber, in absence of which, configuring the said
>>> timestamp-based resolution methods will result in error.
>>>
>>> Note: If the user has chosen the latest or earliest_timestamp_wins,
>>> and the remote and local timestamps are the same, then it will go by
>>> system identifier. The change with a higher system identifier will
>>> win. This will ensure that the same change is picked on all the nodes.
>>
>> How is this going to deal with the fact that commit LSN and timestamps
>> may not correlate perfectly? That is, commits may happen with LSN1 <
>> LSN2 but with T1 > T2.
>>
>
> One of the possible scenarios discussed at pgconf.dev with Tomas for
> this was as follows:
>
> Say there are two publisher nodes PN1, PN2, and subscriber node SN3.
> The logical replication is configured such that a subscription on SN3
> has publications from both PN1 and PN2. For example, SN3 (sub) -> PN1,
> PN2 (p1, p2)
>
> Now, on PN1, we have the following operations that update the same row:
>
> T1
> Update-1 on table t1 at LSN1 (1000) on time (200)
>
> T2
> Update-2 on table t1 at LSN2 (2000) on time (100)
>
> Then in parallel, we have the following operation on node PN2 that
> updates the same row as Update-1, and Update-2 on node PN1.
>
> T3
> Update-3 on table t1 at LSN(1500) on time (150)
>
> By theory, we can have a different state on subscribers depending on
> the order of updates arriving at SN3 which shouldn't happen. Say, the
> order in which they reach SN3 is: Update-1, Update-2, Update-3 then
> the final row we have is by Update-3 considering we have configured
> last_update_wins as a conflict resolution method. Now, consider the
> other order: Update-1, Update-3, Update-2, in this case, the final
> row will be by Update-2 because when we try to apply Update-3, it will
> generate a conflict and as per the resolution method
> (last_update_wins) we need to retain Update-1.
>
> On further thinking, the operations on node-1 PN-1 as defined above
> seem impossible because one of the Updates needs to wait for the other
> to write a commit record. So the commits may happen with LSN1 < LSN2
> but with T1 > T2 but they can't be on the same row due to locks. So,
> the order of apply should still be consistent. Am, I missing
> something?
>

Sorry, I should have read your message before responding a couple
minutes ago. I think you're right this exact example can't happen, due
to the dependency between transactions.

But as I wrote, I'm not quite convinced this means there are not other
issues with this way of resolving conflicts. It's more likely a more
complex scenario is required.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-06-07 12:57:13 Re: Compress ReorderBuffer spill files using LZ4
Previous Message Robert Haas 2024-06-07 12:46:51 Re: Assert in heapgettup_pagemode() fails due to underlying buffer change