From: | Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> |
Subject: | Conflict detection for multiple_unique_conflicts in logical replication |
Date: | 2025-02-20 11:30:48 |
Message-ID: | CABdArM7FW-_dnthGkg2s0fy1HhUB8C3ELA0gZX1kkbs1ZZoV3Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Hackers,
(CCing people involved in related discussions)
I am starting this thread to propose a new conflict detection type
"multiple_unique_conflicts" that identifies when an incoming row
during logical replication violates more than one UNIQUE constraint.
If multiple constraints (such as the primary key/replica identity
along with additional unique indexes) are violated for the same
incoming tuple, the apply_worker will trigger a dedicated
multiple_unique_conflicts conflict.
----
ISSUE:
----
Currently, when the apply_worker encounters an 'insert_exists'
conflict (where the incoming row conflicts with an existing row based
on the replica identity), it logs the conflict and errors out
immediately. If the user tries to resolve this manually by deleting
the conflicting row, the apply worker may fail again due to another
unique constraint violation on a different column. This forces users
to fix conflicts one by one, making resolution tedious and
inefficient.
Example:
Schema:
CREATE TABLE tab1 (col1 integer PRIMARY KEY, col2 integer UNIQUE, col3
integer UNIQUE);
- col1 is Replica Identity.
Data:
- on pub: (1, 11, 111)
- on sub: 3 additional local Inserts: (2, 22, 222); (3, 33, 333); (4, 44, 444)
- Concurrently on pub, new insert: (2, 33, 444)
When the new incoming tuple (2, 33, 444) is applied on the subscriber:
- The apply worker first detects an 'insert_exists' conflict on col1
(primary key) and errors out.
- If the user deletes the conflicting row (key col1=2) : (2, 22,
222), the apply worker fails again because col2=33 violates another
unique constraint for tuple (3, 33, 333);
- If the user deletes col2=33, the apply worker fails yet again due
to tuple (4, 44, 444) because col3=444 is also unique.
Conflicts on both col2 and col3 (which are independent of each other)
are an example of a 'Multiple Unique Constraints' violation.
In such cases, users are forced to resolve conflicts one by one,
making the process slow and error-prone.
---
SOLUTION:
---
During an INSERT or UPDATE conflict check, instead of stopping at the
first encountered conflict, the apply_worker will now check all unique
indexes before reporting a conflict. If multiple unique key violations
are found, it will report a 'multiple_unique_conflicts' conflict,
listing all conflicting tuples in the logs. If only a single key
conflict is detected, the existing 'insert_exists' conflict will be
raised as it is now.
This helps users resolve conflicts in one go, by deleting all
conflicting tuples at once, instead of handling them one at a time,
unless they choose to skip the transaction.
Without an automatic resolution strategy in place, the apply worker
will keep retrying until manual intervention resolves the conflict.
Like other existing conflict types, statistics for
multiple_unique_conflicts will be collected and can be viewed in the
pg_stat_subscription_stats view for monitoring.
~~~
Note:
Beyond its immediate benefits, this conflict type also helps the
future automatic resolution [1]. Handling multiple unique conflicts
under the 'insert_exists' or 'update_exists' conflict resolutions can
cause unexpected behavior since their resolution strategies are
designed for single-tuple conflicts. For example, 'last_update_wins'
works well for a single key conflict but fails when the remote tuple
is newer than some conflicting local rows and older than others,
making resolution unclear. Introducing 'multiple_unique_conflicts'
ensures such cases are handled separately and correctly with dedicated
resolution strategies. The possible resolution strategies for this
conflict type could be:
apply: Apply the remote change by deleting all conflicting rows and
then inserting or updating the remote row.
skip: Skip applying the change.
error: Error out on conflict (default behavior).
~~~
The proposal patch is attached. Suggestions and feedback are highly appreciated!
--
Thanks,
Nisha
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Implement-the-conflict-detection-for-multiple_uni.patch | application/octet-stream | 21.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2025-02-20 11:31:33 | Re: ReplicationSlotRelease() crashes when the instance is in the single user mode |
Previous Message | Hayato Kuroda (Fujitsu) | 2025-02-20 11:26:30 | RE: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided. |