Re: Conflict Detection and Resolution

From: "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>
Subject: Re: Conflict Detection and Resolution
Date: 2024-06-13 17:48:41
Message-ID: 1eb9242f-dcb6-45c3-871c-98ec324e03ef@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6/13/24 7:28 AM, Amit Kapila wrote:

> You are right that users would wish to detect the conflicts and
> probably the extra effort would only be in the 'update_differ' case
> where we need to consult committs module and that we will only do when
> 'track_commit_timestamp' is true. BTW, I think for Inserts with
> primary/unique key violation, we should catch the ERROR and log it. If
> we want to log the conflicts in a separate table then do we want to do
> that in the catch block after getting pk violation or do an extra scan
> before 'INSERT' to find the conflict? I think logging would need extra
> cost especially if we want to LOG it in some table as you are
> suggesting below that may need some option.
>
>>> Therefore, additional conflict detection for these cases is currently
>>> omitted to minimize potential overhead. However, the pre-detection for
>>> conflict in these error cases is still essential to support automatic
>>> conflict resolution in the future.
>>
>> I feel that we should log all types of conflict in an uniform way. For
>> example, with detect_conflict being true, the update_differ conflict
>> is reported as "conflict %s detected on relation "%s"", whereas
>> concurrent inserts with the same key is reported as "duplicate key
>> value violates unique constraint "%s"", which could confuse users.
>> Ideally, I think that we log such conflict detection details (table
>> name, column name, conflict type, etc) to somewhere (e.g. a table or
>> server logs) so that the users can resolve them manually.
>>
>
> It is good to think if there is a value in providing in
> pg_conflicts_history kind of table which will have details of
> conflicts that occurred and then we can extend it to have resolutions.
> I feel we can anyway LOG the conflicts by default. Updating a separate
> table with conflicts should be done by default or with a knob is a
> point to consider.

+1 for logging conflicts uniformly, but I would +100 to exposing the log
in a way that's easy for the user to query (whether it's a system view
or a stat table). Arguably, I'd say that would be the most important
feature to come out of this effort.

Removing how conflicts are resolved, users want to know exactly what row
had a conflict, and users from other database systems that have dealt
with these issues will have tooling to be able to review and analyze if
a conflicts occur. This data is typically stored in a queryable table,
with data retained for N days. When you add in automatic conflict
resolution, users then want to have a record of how the conflict was
resolved, in case they need to manually update it.

Having this data in a table also gives the user opportunity to
understand conflict stats (e.g. conflict rates) and potentially identify
portions of the application and other parts of the system to optimize.
It also makes it easier to import to downstream systems that may perform
further analysis on conflict resolution, or alarm if a conflict rate
exceeds a certain threshold.

Thanks,

Jonathan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-06-13 17:51:11 Re: RFC: adding pytest as a supported test framework
Previous Message Tom Lane 2024-06-13 17:47:53 Re: RFC: adding pytest as a supported test framework