From: | Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> |
---|---|
To: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
Subject: | Re: Conflict detection for multiple_unique_conflicts in logical replication |
Date: | 2025-03-11 08:58:42 |
Message-ID: | CABdArM42thrKrgvERxyw9hhc56tM4P+CmjKFpU07hR_9UMM8Zg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Mar 11, 2025 at 11:10 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Thu, Feb 20, 2025 at 5:01 PM Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
> >
> > Hi Hackers,
> > (CCing people involved in related discussions)
> >
> > I am starting this thread to propose a new conflict detection type
> > "multiple_unique_conflicts" that identifies when an incoming row
> > during logical replication violates more than one UNIQUE constraint.
> > If multiple constraints (such as the primary key/replica identity
> > along with additional unique indexes) are violated for the same
> > incoming tuple, the apply_worker will trigger a dedicated
> > multiple_unique_conflicts conflict.
> >
> > ----
> > ISSUE:
> > ----
> > Currently, when the apply_worker encounters an 'insert_exists'
> > conflict (where the incoming row conflicts with an existing row based
> > on the replica identity), it logs the conflict and errors out
> > immediately. If the user tries to resolve this manually by deleting
> > the conflicting row, the apply worker may fail again due to another
> > unique constraint violation on a different column. This forces users
> > to fix conflicts one by one, making resolution tedious and
> > inefficient.
> >
> > Example:
> > Schema:
> > CREATE TABLE tab1 (col1 integer PRIMARY KEY, col2 integer UNIQUE, col3
> > integer UNIQUE);
> > - col1 is Replica Identity.
> >
> > Data:
> > - on pub: (1, 11, 111)
> > - on sub: 3 additional local Inserts: (2, 22, 222); (3, 33, 333); (4, 44, 444)
> > - Concurrently on pub, new insert: (2, 33, 444)
> >
> > When the new incoming tuple (2, 33, 444) is applied on the subscriber:
> > - The apply worker first detects an 'insert_exists' conflict on col1
> > (primary key) and errors out.
> > - If the user deletes the conflicting row (key col1=2) : (2, 22,
> > 222), the apply worker fails again because col2=33 violates another
> > unique constraint for tuple (3, 33, 333);
> > - If the user deletes col2=33, the apply worker fails yet again due
> > to tuple (4, 44, 444) because col3=444 is also unique.
> > Conflicts on both col2 and col3 (which are independent of each other)
> > are an example of a 'Multiple Unique Constraints' violation.
> >
> > In such cases, users are forced to resolve conflicts one by one,
> > making the process slow and error-prone.
> >
> > ---
> > SOLUTION:
> > ---
> > During an INSERT or UPDATE conflict check, instead of stopping at the
> > first encountered conflict, the apply_worker will now check all unique
> > indexes before reporting a conflict. If multiple unique key violations
> > are found, it will report a 'multiple_unique_conflicts' conflict,
> > listing all conflicting tuples in the logs. If only a single key
> > conflict is detected, the existing 'insert_exists' conflict will be
> > raised as it is now.
>
> I think it makes sense to report all the unique key conflicts for a
> single row at once, rather than stopping after the first one. However,
> I don't understand the need to create a new conflict type. Can't we
> report multiple conflicts for the row using the existing conflict
> type, if different columns in the incoming row conflict with different
> existing rows?
>
The goal of introducing a new conflict type is to handle multiple-key
conflicts separately. It will not only allow users to apply different
resolution methods for single-key vs multi-key conflicts but will also
help them identify such cases directly from stats instead of filtering
through the LOG file.
For example, with future resolution options, 'last_update_wins' will
work well for single-key conflicts (insert_exists/update_exists) since
only one local tuple will be replaced if the remote wins. However, for
multi-key conflicts, this resolution may not be feasible. Even if
supported, resolving the conflict could result in deleting multiple
tuples, which users may want to control separately, like opting to
skip or error out in case of multi-key conflicts.
For reference, databases like EDB-PGD(BDR)[1] support a separate
conflict type for multi-key cases. OTOH, Oracle[2] and DB2[3] do not
have a separate conflict_type and always ERROR out during conflict
resolution if multiple unique constraint violations happen. However,
none of these databases use the single-key conflict_type and
resolution for the multi-key conflict cases.
We’d like to know your preference—should we always ERROR out like
Oracle/DB2, or keep it as a separate conflict_type for better control
during resolution?
[1] https://www.enterprisedb.com/docs/pgd/4/bdr/conflicts/#insert-operations-that-violate-multiple-unique-constraints
[2] https://docs.oracle.com/goldengate/c1230/gg-winux/GWUAD/configuring-conflict-detection-and-resolution.htm#GWUAD316
[3] https://www.ibm.com/docs/en/idr/11.4.0?topic=console-setting-conflict-detection-resolution
--
Thanks,
Nisha Moond
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Langote | 2025-03-11 09:06:37 | Re: SQL Property Graph Queries (SQL/PGQ) |
Previous Message | Amit Kapila | 2025-03-11 08:52:22 | Re: Selectively invalidate caches in pgoutput module |