From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> |
Subject: | Re: Conflict Detection and Resolution |
Date: | 2024-06-10 08:54:24 |
Message-ID: | CAA4eK1LuGV_iXmL0Vm850oVCfDgyO_mk0r2BKZCgdQ2kUfDSfQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jun 7, 2024 at 6:08 PM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> On 5/27/24 07:48, shveta malik wrote:
> > On Sat, May 25, 2024 at 2:39 AM Tomas Vondra
> > <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> >>
> >> Which architecture are you aiming for? Here you talk about multiple
> >> providers, but the wiki page mentions active-active. I'm not sure how
> >> much this matters, but it might.
> >
> > Currently, we are working for multi providers case but ideally it
> > should work for active-active also. During further discussion and
> > implementation phase, if we find that, there are cases which will not
> > work in straight-forward way for active-active, then our primary focus
> > will remain to first implement it for multiple providers architecture.
> >
> >>
> >> Also, what kind of consistency you expect from this? Because none of
> >> these simple conflict resolution methods can give you the regular
> >> consistency models we're used to, AFAICS.
> >
> > Can you please explain a little bit more on this.
> >
>
> I was referring to the well established consistency models / isolation
> levels, e.g. READ COMMITTED or SNAPSHOT ISOLATION. This determines what
> guarantees the application developer can expect, what anomalies can
> happen, etc.
>
> I don't think any such isolation level can be implemented with a simple
> conflict resolution methods like last-update-wins etc. For example,
> consider an active-active where both nodes do
>
> UPDATE accounts SET balance=balance+1000 WHERE id=1
>
> This will inevitably lead to a conflict, and while the last-update-wins
> resolves this "consistently" on both nodes (e.g. ending with the same
> result), it's essentially a lost update.
>
The idea to solve such conflicts is using the delta apply technique
where the delta from both sides will be applied to the respective
columns. We do plan to target this as a separate patch. Now, if the
basic conflict resolution and delta apply both can't go in one
release, we shall document such cases clearly to avoid misuse of the
feature.
> This is a very simplistic example of course, I recall there are various
> more complex examples involving foreign keys, multi-table transactions,
> constraints, etc. But in principle it's a manifestation of the same
> inherent limitation of conflict detection and resolution etc.
>
> Similarly, I believe this affects not just active-active, but also the
> case where one node aggregates data from multiple publishers. Maybe not
> to the same extent / it might be fine for that use case,
>
I am not sure how much it is a problem for general logical replication
solution but we do intend to work on solving such problems in
step-wise manner. Trying to attempt everything in one patch doesn't
seem advisable to me.
>
but you said
> the end goal is to use this for active-active. So I'm wondering what's
> the plan, there.
>
I think at this stage we are not ready for active-active because
leaving aside this feature we need many other features like
replication of all commands/objects (DDL replication, replicate large
objects, etc.), Global sequences, some sort of global two_phase
transaction management for data consistency, etc. So, it would be
better to consider logical replication cases intending to extend it
for active-active when we have other required pieces.
> If I'm writing an application for active-active using this conflict
> handling, what assumptions can I make? Will Can I just do stuff as if on
> a single node, or do I need to be super conscious about the zillion ways
> things can misbehave in a distributed system?
>
> My personal opinion is that the closer this will be to the regular
> consistency levels, the better. If past experience taught me anything,
> it's very hard to predict how distributed systems with eventual
> consistency behave, and even harder to actually test the application in
> such environment.
>
I don't think in any way this can enable users to start writing
applications for active-active workloads. For something like what you
are saying, we probably need a global transaction manager (or a global
two_pc) as well to allow transactions to behave as they are on
single-node or achieve similar consistency levels. With such
transaction management, we can allow transactions to commit on a node
only when it doesn't lead to a conflict on the peer node.
> In any case, if there are any differences compared to the usual
> behavior, it needs to be very clearly explained in the docs.
>
I agree that docs should be clear about the cases that this can and
can't support.
> >>
> >> How is this going to deal with the fact that commit LSN and timestamps
> >> may not correlate perfectly? That is, commits may happen with LSN1 <
> >> LSN2 but with T1 > T2.
> >
> > Are you pointing to the issue where a session/txn has taken
> > 'xactStopTimestamp' timestamp earlier but is delayed to insert record
> > in XLOG, while another session/txn which has taken timestamp slightly
> > later succeeded to insert the record IN XLOG sooner than the session1,
> > making LSN and Timestamps out of sync? Going by this scenario, the
> > commit-timestamp may not be reflective of actual commits and thus
> > timestamp-based resolvers may take wrong decisions. Or do you mean
> > something else?
> >
> > If this is the problem you are referring to, then I think this needs a
> > fix at the publisher side. Let me think more about it . Kindly let me
> > know if you have ideas on how to tackle it.
> >
>
> Yes, this is the issue I'm talking about. We're acquiring the timestamp
> when not holding the lock to reserve space in WAL, so the LSN and the
> commit LSN may not actually correlate.
>
> Consider this example I discussed with Amit last week:
>
> node A:
>
> XACT1: UPDATE t SET v = 1; LSN1 / T1
>
> XACT2: UPDATE t SET v = 2; LSN2 / T2
>
> node B
>
> XACT3: UPDATE t SET v = 3; LSN3 / T3
>
> And assume LSN1 < LSN2, T1 > T2 (i.e. the commit timestamp inversion),
> and T2 < T3 < T1. Now consider that the messages may arrive in different
> orders, due to async replication. Unfortunately, this would lead to
> different results of the conflict resolution:
>
> XACT1 - XACT2 - XACT3 => v=3 (T3 wins)
>
> XACT3 - XACT1 - XACT2 => v=2 (T2 wins)
>
> Now, I realize there's a flaw in this example - the (T1 > T2) inversion
> can't actually happen, because these transactions have a dependency, and
> thus won't commit concurrently. XACT1 will complete the commit, because
> XACT2 starts to commit. And with monotonic clock (which is a requirement
> for any timestamp-based resolution), that should guarantee (T1 < T2).
>
> However, I doubt this is sufficient to declare victory. It's more likely
> that there still are problems, but the examples are likely more complex
> (changes to multiple tables, etc.).
>
Fair enough, I think we need to analyze this more to find actual
problems or in some way try to prove that there is no problem.
> I vaguely remember there were more issues with timestamp inversion, but
> those might have been related to parallel apply etc.
>
Okay, so considering there are problems due to timestamp inversion, I
think the solution to that problem would probably be somehow
generating commit LSN and timestamp in order. I don't have a solution
at this stage but will think more both on the actual problem and
solution. In the meantime, if you get a chance to refer to the place
where you have seen such a problem please try to share the same with
us. It would be helpful.
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Richard Guo | 2024-06-10 09:05:01 | Re: Wrong results with grouping sets |
Previous Message | Bertrand Drouvot | 2024-06-10 08:09:56 | Re: relfilenode statistics |