Re: Conflict Detection and Resolution

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Subject: Re: Conflict Detection and Resolution
Date: 2024-06-13 11:28:26
Message-ID: CAA4eK1+d5M0pnwsWa1FWakrwi6dMoeNfQx_4eGn+C7C5wh6UEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 13, 2024 at 11:41 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, Jun 5, 2024 at 3:32 PM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > Hi,
> >
> > This time at PGconf.dev[1], we had some discussions regarding this
> > project. The proposed approach is to split the work into two main
> > components. The first part focuses on conflict detection, which aims to
> > identify and report conflicts in logical replication. This feature will
> > enable users to monitor the unexpected conflicts that may occur. The
> > second part involves the actual conflict resolution. Here, we will provide
> > built-in resolutions for each conflict and allow user to choose which
> > resolution will be used for which conflict(as described in the initial
> > email of this thread).
>
> I agree with this direction that we focus on conflict detection (and
> logging) first and then develop conflict resolution on top of that.
>
> >
> > Of course, we are open to alternative ideas and suggestions, and the
> > strategy above can be changed based on ongoing discussions and feedback
> > received.
> >
> > Here is the patch of the first part work, which adds a new parameter
> > detect_conflict for CREATE and ALTER subscription commands. This new
> > parameter will decide if subscription will go for conflict detection. By
> > default, conflict detection will be off for a subscription.
> >
> > When conflict detection is enabled, additional logging is triggered in the
> > following conflict scenarios:
> >
> > * updating a row that was previously modified by another origin.
> > * The tuple to be updated is not found.
> > * The tuple to be deleted is not found.
> >
> > While there exist other conflict types in logical replication, such as an
> > incoming insert conflicting with an existing row due to a primary key or
> > unique index, these cases already result in constraint violation errors.
>
> What does detect_conflict being true actually mean to users? I
> understand that detect_conflict being true could introduce some
> overhead to detect conflicts. But in terms of conflict detection, even
> if detect_confict is false, we detect some conflicts such as
> concurrent inserts with the same key. Once we introduce the complete
> conflict detection feature, I'm not sure there is a case where a user
> wants to detect only some particular types of conflict.
>

You are right that users would wish to detect the conflicts and
probably the extra effort would only be in the 'update_differ' case
where we need to consult committs module and that we will only do when
'track_commit_timestamp' is true. BTW, I think for Inserts with
primary/unique key violation, we should catch the ERROR and log it. If
we want to log the conflicts in a separate table then do we want to do
that in the catch block after getting pk violation or do an extra scan
before 'INSERT' to find the conflict? I think logging would need extra
cost especially if we want to LOG it in some table as you are
suggesting below that may need some option.

> > Therefore, additional conflict detection for these cases is currently
> > omitted to minimize potential overhead. However, the pre-detection for
> > conflict in these error cases is still essential to support automatic
> > conflict resolution in the future.
>
> I feel that we should log all types of conflict in an uniform way. For
> example, with detect_conflict being true, the update_differ conflict
> is reported as "conflict %s detected on relation "%s"", whereas
> concurrent inserts with the same key is reported as "duplicate key
> value violates unique constraint "%s"", which could confuse users.
> Ideally, I think that we log such conflict detection details (table
> name, column name, conflict type, etc) to somewhere (e.g. a table or
> server logs) so that the users can resolve them manually.
>

It is good to think if there is a value in providing in
pg_conflicts_history kind of table which will have details of
conflicts that occurred and then we can extend it to have resolutions.
I feel we can anyway LOG the conflicts by default. Updating a separate
table with conflicts should be done by default or with a knob is a
point to consider.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nazir Bilal Yavuz 2024-06-13 11:38:46 Re: CI and test improvements
Previous Message Daniel Gustafsson 2024-06-13 11:27:42 Re: RFC: adding pytest as a supported test framework