Re: Conflict Detection and Resolution

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: Conflict Detection and Resolution
Date: 2024-06-17 00:08:01
Message-ID: 9ae1d594-7d87-4eda-b987-e0593547f468@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6/13/24 06:52, Dilip Kumar wrote:
> On Wed, Jun 12, 2024 at 5:26 PM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>
>>>> I agree having a separate update_deleted conflict would be beneficial,
>>>> I'm not arguing against that - my point is actually that I think this
>>>> conflict type is required, and that it needs to be detected reliably.
>>>>
>>>
>>> When working with a distributed system, we must accept some form of
>>> eventual consistency model.
>>
>> I'm not sure this is necessarily true. There are distributed databases
>> implementing (or aiming to) regular consistency models, without eventual
>> consistency. I'm not saying it's easy, but it shows eventual consistency
>> is not the only option.
>
> Right, that statement might not be completely accurate. Based on the
> CAP theorem, when a network partition is unavoidable and availability
> is expected, we often choose an eventual consistency model. However,
> claiming that a fully consistent model is impossible in any
> distributed system is incorrect, as it can be achieved using
> mechanisms like Two-Phase Commit.
>
> We must also accept that our PostgreSQL replication mechanism does not
> guarantee a fully consistent model. Even with synchronous commit, it
> only waits for the WAL to be replayed on the standby but does not
> change the commit decision based on other nodes. This means, at most,
> we can only guarantee "Read Your Write" consistency.
>
Perhaps, but even accepting eventual consistency does not absolve us
from actually defining what that means, ensuring it's sensible enough to
be practical/usable, and that it actually converges to a consistent
state (that's essentially the problem of the update conflict types,
because misdetecting it results in diverging results).

>>> However, it's essential to design a
>>> predictable and acceptable behavior. For example, if a change is a
>>> result of a previous operation (such as an update on node B triggered
>>> after observing an operation on node A), we can say that the operation
>>> on node A happened before the operation on node B. Conversely, if
>>> operations on nodes A and B are independent, we consider them
>>> concurrent.
>>>
>>
>> Right. And this is precisely the focus or my questions - understanding
>> what behavior we aim for / expect in the end. Or said differently, what
>> anomalies / weird behavior would be considered expected.
>
>> Because that's important both for discussions about feasibility, etc.
>> And also for evaluation / reviews of the patch.
>
> +1
>
>>> In distributed systems, clock skew is a known issue. To establish a
>>> consistency model, we need to ensure it guarantees the
>>> "happens-before" relationship. Consider a scenario with three nodes:
>>> NodeA, NodeB, and NodeC. If NodeA sends changes to NodeB, and
>>> subsequently NodeB makes changes, and then both NodeA's and NodeB's
>>> changes are sent to NodeC, the clock skew might make NodeB's changes
>>> appear to have occurred before NodeA's changes. However, we should
>>> maintain data that indicates NodeB's changes were triggered after
>>> NodeA's changes arrived at NodeB. This implies that logically, NodeB's
>>> changes happened after NodeA's changes, despite what the timestamps
>>> suggest.
>>>
>>> A common method to handle such cases is using vector clocks for
>>> conflict resolution. "Vector clocks" allow us to track the causal
>>> relationships between changes across nodes, ensuring that we can
>>> correctly order events and resolve conflicts in a manner that respects
>>> the "happens-before" relationship. This method helps maintain
>>> consistency and predictability in the system despite issues like clock
>>> skew.
>>>
>>
>> I'm familiar with the concept of vector clock (or logical clock in
>> general), but it's not clear to me how you plan to use this in the
>> context of the conflict handling. Can you elaborate/explain?
>>
>> The way I see it, conflict handling is pretty tightly coupled with
>> regular commit timestamps and MVCC in general. How would you use vector
>> clock to change that?
>
> The issue with using commit timestamps is that, when multiple nodes
> are involved, the commit timestamp won't accurately represent the
> actual order of operations. There's no reliable way to determine the
> perfect order of each operation happening on different nodes roughly
> simultaneously unless we use some globally synchronized counter.
> Generally, that order might not cause real issues unless one operation
> is triggered by a previous operation, and relying solely on physical
> timestamps would not detect that correctly.
>
This whole conflict detection / resolution proposal is based on using
commit timestamps. Aren't you suggesting it can't really work with
commit timestamps?

FWIW there are ways to builds distributed consistency with timestamps,
as long as it's monotonic - e.g. clock-SI does that. It's not perfect,
but it shows it's possible.

However, I'm not we have to go there - it depends on what the goal is.
For a one-directional replication (multiple nodes replicating to the
same target) it might be sufficient if the conflict resolution is
"deterministic" (e.g. not dependent on the order in which the changes
are applied). I'm not sure, but it's why I asked what's the goal in my
very first message in this thread.

> We need some sort of logical counter, such as a vector clock, which
> might be an independent counter on each node but can perfectly track
> the causal order. For example, if NodeA observes an operation from
> NodeB with a counter value of X, NodeA will adjust its counter to X+1.
> This ensures that if NodeA has seen an operation from NodeB, its next
> operation will appear to have occurred after NodeB's operation.
>
> I admit that I haven't fully thought through how we could design such
> version tracking in our logical replication protocol or how it would
> fit into our system. However, my point is that we need to consider
> something beyond commit timestamps to achieve reliable ordering.
>

I can't really respond to this as there's no suggestion how it would be
implemented in the patch discussed in this thread.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2024-06-17 00:23:14 Re: Using LibPq in TAP tests via FFI
Previous Message Thomas Munro 2024-06-17 00:03:28 Re: Using LibPq in TAP tests via FFI