From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Subject: | Re: Conflict Detection and Resolution |
Date: | 2024-06-07 12:37:58 |
Message-ID: | 58bb4c67-9f0c-436a-9860-504a88272f71@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 5/27/24 07:48, shveta malik wrote:
> On Sat, May 25, 2024 at 2:39 AM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>
>> On 5/23/24 08:36, shveta malik wrote:
>>> Hello hackers,
>>>
>>> Please find the proposal for Conflict Detection and Resolution (CDR)
>>> for Logical replication.
>>> <Thanks to Nisha, Hou-San, and Amit who helped in figuring out the
>>> below details.>
>>>
>>> Introduction
>>> ================
>>> In case the node is subscribed to multiple providers, or when local
>>> writes happen on a subscriber, conflicts can arise for the incoming
>>> changes. CDR is the mechanism to automatically detect and resolve
>>> these conflicts depending on the application and configurations.
>>> CDR is not applicable for the initial table sync. If locally, there
>>> exists conflicting data on the table, the table sync worker will fail.
>>> Please find the details on CDR in apply worker for INSERT, UPDATE and
>>> DELETE operations:
>>>
>>
>> Which architecture are you aiming for? Here you talk about multiple
>> providers, but the wiki page mentions active-active. I'm not sure how
>> much this matters, but it might.
>
> Currently, we are working for multi providers case but ideally it
> should work for active-active also. During further discussion and
> implementation phase, if we find that, there are cases which will not
> work in straight-forward way for active-active, then our primary focus
> will remain to first implement it for multiple providers architecture.
>
>>
>> Also, what kind of consistency you expect from this? Because none of
>> these simple conflict resolution methods can give you the regular
>> consistency models we're used to, AFAICS.
>
> Can you please explain a little bit more on this.
>
I was referring to the well established consistency models / isolation
levels, e.g. READ COMMITTED or SNAPSHOT ISOLATION. This determines what
guarantees the application developer can expect, what anomalies can
happen, etc.
I don't think any such isolation level can be implemented with a simple
conflict resolution methods like last-update-wins etc. For example,
consider an active-active where both nodes do
UPDATE accounts SET balance=balance+1000 WHERE id=1
This will inevitably lead to a conflict, and while the last-update-wins
resolves this "consistently" on both nodes (e.g. ending with the same
result), it's essentially a lost update.
This is a very simplistic example of course, I recall there are various
more complex examples involving foreign keys, multi-table transactions,
constraints, etc. But in principle it's a manifestation of the same
inherent limitation of conflict detection and resolution etc.
Similarly, I believe this affects not just active-active, but also the
case where one node aggregates data from multiple publishers. Maybe not
to the same extent / it might be fine for that use case, but you said
the end goal is to use this for active-active. So I'm wondering what's
the plan, there.
If I'm writing an application for active-active using this conflict
handling, what assumptions can I make? Will Can I just do stuff as if on
a single node, or do I need to be super conscious about the zillion ways
things can misbehave in a distributed system?
My personal opinion is that the closer this will be to the regular
consistency levels, the better. If past experience taught me anything,
it's very hard to predict how distributed systems with eventual
consistency behave, and even harder to actually test the application in
such environment.
In any case, if there are any differences compared to the usual
behavior, it needs to be very clearly explained in the docs.
>>
>>> INSERT
>>> ================
>>> To resolve INSERT conflict on subscriber, it is important to find out
>>> the conflicting row (if any) before we attempt an insertion. The
>>> indexes or search preference for the same will be:
>>> First check for replica identity (RI) index.
>>> - if not found, check for the primary key (PK) index.
>>> - if not found, then check for unique indexes (individual ones or
>>> added by unique constraints)
>>> - if unique index also not found, skip CDR
>>>
>>> Note: if no RI index, PK, or unique index is found but
>>> REPLICA_IDENTITY_FULL is defined, CDR will still be skipped.
>>> The reason being that even though a row can be identified with
>>> REPLICAT_IDENTITY_FULL, such tables are allowed to have duplicate
>>> rows. Hence, we should not go for conflict detection in such a case.
>>>
>>
>> It's not clear to me why would REPLICA_IDENTITY_FULL mean the table is
>> allowed to have duplicate values? It just means the upstream is sending
>> the whole original row, there can still be a PK/UNIQUE index on both the
>> publisher and subscriber.
>
> Yes, right. Sorry for confusion. I meant the same i.e. in absence of
> 'RI index, PK, or unique index', tables can have duplicates. So even
> in presence of Replica-identity (FULL in this case) but in absence of
> unique/primary index, CDR will be skipped for INSERT.
>
>>
>>> In case of replica identity ‘nothing’ and in absence of any suitable
>>> index (as defined above), CDR will be skipped for INSERT.
>>>
>>> Conflict Type:
>>> ----------------
>>> insert_exists: A conflict is detected when the table has the same
>>> value for a key column as the new value in the incoming row.
>>>
>>> Conflict Resolution
>>> ----------------
>>> a) latest_timestamp_wins: The change with later commit timestamp wins.
>>> b) earliest_timestamp_wins: The change with earlier commit timestamp wins.
>>> c) apply: Always apply the remote change.
>>> d) skip: Remote change is skipped.
>>> e) error: Error out on conflict. Replication is stopped, manual
>>> action is needed.
>>>
>>
>> Why not to have some support for user-defined conflict resolution
>> methods, allowing to do more complex stuff (e.g. merging the rows in
>> some way, perhaps even with datatype-specific behavior)?
>
> Initially, for the sake of simplicity, we are targeting to support
> built-in resolvers. But we have a plan to work on user-defined
> resolvers as well. We shall propose that separately.
>
>>
>>> The change will be converted to 'UPDATE' and applied if the decision
>>> is in favor of applying remote change.
>>>
>>> It is important to have commit timestamp info available on subscriber
>>> when latest_timestamp_wins or earliest_timestamp_wins method is chosen
>>> as resolution method. Thus ‘track_commit_timestamp’ must be enabled
>>> on subscriber, in absence of which, configuring the said
>>> timestamp-based resolution methods will result in error.
>>>
>>> Note: If the user has chosen the latest or earliest_timestamp_wins,
>>> and the remote and local timestamps are the same, then it will go by
>>> system identifier. The change with a higher system identifier will
>>> win. This will ensure that the same change is picked on all the nodes.
>>
>> How is this going to deal with the fact that commit LSN and timestamps
>> may not correlate perfectly? That is, commits may happen with LSN1 <
>> LSN2 but with T1 > T2.
>
> Are you pointing to the issue where a session/txn has taken
> 'xactStopTimestamp' timestamp earlier but is delayed to insert record
> in XLOG, while another session/txn which has taken timestamp slightly
> later succeeded to insert the record IN XLOG sooner than the session1,
> making LSN and Timestamps out of sync? Going by this scenario, the
> commit-timestamp may not be reflective of actual commits and thus
> timestamp-based resolvers may take wrong decisions. Or do you mean
> something else?
>
> If this is the problem you are referring to, then I think this needs a
> fix at the publisher side. Let me think more about it . Kindly let me
> know if you have ideas on how to tackle it.
>
Yes, this is the issue I'm talking about. We're acquiring the timestamp
when not holding the lock to reserve space in WAL, so the LSN and the
commit LSN may not actually correlate.
Consider this example I discussed with Amit last week:
node A:
XACT1: UPDATE t SET v = 1; LSN1 / T1
XACT2: UPDATE t SET v = 2; LSN2 / T2
node B
XACT3: UPDATE t SET v = 3; LSN3 / T3
And assume LSN1 < LSN2, T1 > T2 (i.e. the commit timestamp inversion),
and T2 < T3 < T1. Now consider that the messages may arrive in different
orders, due to async replication. Unfortunately, this would lead to
different results of the conflict resolution:
XACT1 - XACT2 - XACT3 => v=3 (T3 wins)
XACT3 - XACT1 - XACT2 => v=2 (T2 wins)
Now, I realize there's a flaw in this example - the (T1 > T2) inversion
can't actually happen, because these transactions have a dependency, and
thus won't commit concurrently. XACT1 will complete the commit, because
XACT2 starts to commit. And with monotonic clock (which is a requirement
for any timestamp-based resolution), that should guarantee (T1 < T2).
However, I doubt this is sufficient to declare victory. It's more likely
that there still are problems, but the examples are likely more complex
(changes to multiple tables, etc.).
I vaguely remember there were more issues with timestamp inversion, but
those might have been related to parallel apply etc.
>>>
>>> UPDATE
>>> ================
>>>
>>> Conflict Detection Method:
>>> --------------------------------
>>> Origin conflict detection: The ‘origin’ info is used to detect
>>> conflict which can be obtained from commit-timestamp generated for
>>> incoming txn at the source node. To compare remote’s origin with the
>>> local’s origin, we must have origin information for local txns as well
>>> which can be obtained from commit-timestamp after enabling
>>> ‘track_commit_timestamp’ locally.
>>> The one drawback here is the ‘origin’ information cannot be obtained
>>> once the row is frozen and the commit-timestamp info is removed by
>>> vacuum. For a frozen row, conflicts cannot be raised, and thus the
>>> incoming changes will be applied in all the cases.
>>>
>>> Conflict Types:
>>> ----------------
>>> a) update_differ: The origin of an incoming update's key row differs
>>> from the local row i.e.; the row has already been updated locally or
>>> by different nodes.
>>> b) update_missing: The row with the same value as that incoming
>>> update's key does not exist. Remote is trying to update a row which
>>> does not exist locally.
>>> c) update_deleted: The row with the same value as that incoming
>>> update's key does not exist. The row is already deleted. This conflict
>>> type is generated only if the deleted row is still detectable i.e., it
>>> is not removed by VACUUM yet. If the row is removed by VACUUM already,
>>> it cannot detect this conflict. It will detect it as update_missing
>>> and will follow the default or configured resolver of update_missing
>>> itself.
>>>
>>
>> I don't understand the why should update_missing or update_deleted be
>> different, especially considering it's not detected reliably. And also
>> that even if we happen to find the row the associated TOAST data may
>> have already been removed. So why would this matter?
>
> Here, we are trying to tackle the case where the row is 'recently'
> deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may
> want to opt for a different resolution in such a case as against the
> one where the corresponding row was not even present in the first
> place. The case where the row was deleted long back may not fall into
> this category as there are higher chances that they have been removed
> by vacuum and can be considered equivalent to the update_ missing
> case.
>
My point is that if we can't detect the difference reliably, it's not
very useful. Consider this example:
Node A:
T1: INSERT INTO t (id, value) VALUES (1,1);
T2: DELETE FROM t WHERE id = 1;
Node B:
T3: UPDATE t SET value = 2 WHERE id = 1;
The "correct" order of received messages on a third node is T1-T3-T2.
But we may also see T1-T2-T3 and T3-T1-T2, e.g. due to network issues
and so on. For T1-T2-T3 the right decision is to discard the update,
while for T3-T1-T2 it's to either wait for the INSERT or wait for the
insert to arrive.
But if we misdetect the situation, we either end up with a row that
shouldn't be there, or losing an update.
> Regarding "TOAST column" for deleted row cases, we may need to dig
> more. Thanks for bringing this case. Let me analyze more here.
>
>>
>>> Conflict Resolutions:
>>> ----------------
>>> a) latest_timestamp_wins: The change with later commit timestamp
>>> wins. Can be used for ‘update_differ’.
>>> b) earliest_timestamp_wins: The change with earlier commit
>>> timestamp wins. Can be used for ‘update_differ’.
>>> c) apply: The remote change is always applied. Can be used for
>>> ‘update_differ’.
>>> d) apply_or_skip: Remote change is converted to INSERT and is
>>> applied. If the complete row cannot be constructed from the info
>>> provided by the publisher, then the change is skipped. Can be used for
>>> ‘update_missing’ or ‘update_deleted’.
>>> e) apply_or_error: Remote change is converted to INSERT and is
>>> applied. If the complete row cannot be constructed from the info
>>> provided by the publisher, then error is raised. Can be used for
>>> ‘update_missing’ or ‘update_deleted’.
>>> f) skip: Remote change is skipped and local one is retained. Can be
>>> used for any conflict type.
>>> g) error: Error out of conflict. Replication is stopped, manual
>>> action is needed. Can be used for any conflict type.
>>>
>>> To support UPDATE CDR, the presence of either replica identity Index
>>> or primary key is required on target node. Update CDR will not be
>>> supported in absence of replica identity index or primary key even
>>> though REPLICA IDENTITY FULL is set. Please refer to "UPDATE" in
>>> "Noteworthey Scenarios" section in [1] for further details.
>>>
>>> DELETE
>>> ================
>>> Conflict Type:
>>> ----------------
>>> delete_missing: An incoming delete is trying to delete a row on a
>>> target node which does not exist.
>>>
>>> Conflict Resolutions:
>>> ----------------
>>> a) error : Error out on conflict. Replication is stopped, manual
>>> action is needed.
>>> b) skip : The remote change is skipped.
>>>
>>> Configuring Conflict Resolution:
>>> ------------------------------------------------
>>> There are two parts when it comes to configuring CDR:
>>>
>>> a) Enabling/Disabling conflict detection.
>>> b) Configuring conflict resolvers for different conflict types.
>>>
>>> Users can sometimes create multiple subscriptions on the same node,
>>> subscribing to different tables to improve replication performance by
>>> starting multiple apply workers. If the tables in one subscription are
>>> less likely to cause conflict, then it is possible that user may want
>>> conflict detection disabled for that subscription to avoid detection
>>> latency while enabling it for other subscriptions. This generates a
>>> requirement to make ‘conflict detection’ configurable per
>>> subscription. While the conflict resolver configuration can remain
>>> global. All the subscriptions which opt for ‘conflict detection’ will
>>> follow global conflict resolver configuration.
>>>
>>> To implement the above, subscription commands will be changed to have
>>> one more parameter 'conflict_resolution=on/off', default will be OFF.
>>>
>>> To configure global resolvers, new DDL command will be introduced:
>>>
>>> CONFLICT RESOLVER ON <conflict_type> IS <conflict_resolver>
>>>
>>
>> I very much doubt we want a single global conflict resolver, or even one
>> resolver per subscription. It seems like a very table-specific thing.
>
> Even we thought about this. We feel that even if we go for table based
> or subscription based resolvers configuration, there may be use case
> scenarios where the user is not interested in configuring resolvers
> for each table and thus may want to give global ones. Thus, we should
> provide a way for users to do global configuration. Thus we started
> with global one. I have noted your point here and would also like to
> know the opinion of others. We are open to discussion. We can either
> opt for any of these 2 options (global or table) or we can opt for
> both global and table/sub based one.
>
I have no problem with a default / global conflict handler, as long as
there's a way to override this per table. This is especially important
for cases with custom conflict handler at table / column level.
>>
>> Also, doesn't all this whole design ignore the concurrency between
>> publishers? Isn't this problematic considering the commit timestamps may
>> go backwards (for a given publisher), which means the conflict
>> resolution is not deterministic (as it depends on how exactly it
>> interleaves)?
>>
>>
>>> -------------------------
>>>
>>> Apart from the above three main operations and resolver configuration,
>>> there are more conflict types like primary-key updates, multiple
>>> unique constraints etc and some special scenarios to be considered.
>>> Complete design details can be found in [1].
>>>
>>> [1]: https://wiki.postgresql.org/wiki/Conflict_Detection_and_Resolution
>>>
>>
>> Hmmm, not sure it's good to have a "complete" design on wiki, and only
>> some subset posted to the mailing list. I haven't compared what the
>> differences are, though.
>
> It would have been difficult to mention all the details in email
> (including examples and corner scenarios) and thus we thought that it
> will be better to document everything in wiki page for the time being.
> We can keep on discussing the design and all the scenarios on need
> basis (before implementation phase of that part) and thus eventually
> everything will come in email on hackers. With out first patch, we
> plan to provide everything in a README as well.
>
The challenge with having this on wiki is that it's unlikely people will
notice any changes made to the wiki.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2024-06-07 12:40:53 | Re: Conflict Detection and Resolution |
Previous Message | Robert Haas | 2024-06-07 12:35:14 | Re: Things I don't like about \du's "Attributes" column |