From: | Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> |
---|---|
To: | Peter Smith <smithpb2250(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
Subject: | Re: Conflict detection for multiple_unique_conflicts in logical replication |
Date: | 2025-03-12 09:45:48 |
Message-ID: | CABdArM6kYUssMuPubY=fgMHbgD5qSk53Dt5G87CP9bfvESt_ow@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Mar 12, 2025 at 6:33 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi Nisha,
>
> Thanks for addressing some of my v1 comments. I confirmed they are all
> ok. But, I haven't reviewed the v2 again because I still had doubts
> about the "stats" question and am waiting to see how that pans out.
> Meanwhile, I had a couple more replies below.
>
> On Tue, Feb 25, 2025 at 8:37 PM Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
> >
> > On Mon, Feb 24, 2025 at 7:39 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > >
> > > Hi Nisha.
> > >
> > > Some review comments for patch v1-0001
> > >
> > > ======
> > > GENERAL
> > >
> > > 1.
> > > This may be a basic/naive question, but it is unclear to me why we
> > > care about the stats of confl_multiple_unique_conflicts?
> > >
> > > I can understand it would be useful to get multiple conflicts logged
> > > at the same time so the user doesn't have to stumble across them one
> > > by one when fixing them, but as far as the stats go, why do we care
> > > about this stat? Also, since you don't distinguish between multiple
> > > insert conflicts versus multiple update conflicts the stat usefulness
> > > seemed even more dubious.
> > >
> > > (because of this question, I skipped reviewing some parts of this
> > > patch related to stats)
> > >
> >
> > IMO, tracking multiple_unique_conflicts, like other conflict stats,
> > helps users understand their workload better, especially since in
> > these cases, stats aren't gathered under either insert_exists or
> > update_exists.
>
> When you say "stats aren't gathered" isn't that just your
> implementation choice? e.g. If an INSERT fails because it
> simultaneously violates multiple (say 5) different constraints I
> assumed that could be implemented as conf_insert_exists += 1 or
> conf_insert_exists += 5. Isn't it just a matter of documenting the
> behaviour?
>
Yes, if we handle multiple-key conflicts under existing single-key
conflicts, we can align the implementation and documentation
accordingly.
However, as I explained in [1], we are providing a separate conflict
type for better control of auto-resolution as well. IMO, If we provide
separate handling for multi-key cases, their stats shouldn't be
included under insert_exists or update_exists. Since these cases won't
follow single-key conflict resolution methods, merging the stats could
be confusing for users.
> > I'll wait for others' opinions too on the need for the stats in this case.
> >
>
> OK. I'll revisit this thread after I learn the outcome.
>
> > > ~~~
> > >
> > > 12.
> > > +ok( $logfile =~
> > > + qr/conflict detected on relation \"public.conf_tab\":
> > > conflict=multiple_unique_conflicts*\n.*DETAIL:.* The remote tuple
> > > violates multiple unique constraints on the local table./,
> > > + 'multiple_unique_conflicts detected during update');
> > >
> > > (same as #12)
> > >
> > > Won't it be more useful to also log the column name causing the
> > > conflict? Otherwise, if there are 100s of unique columns and just 2 of
> > > them are conflicts how will the user know what to look for when fixing
> > > it?
> > >
> >
> > The conflicting column details are logged. In the test case, only the
> > header line of the DETAIL message is compared to keep it simple.
> > For example, the full LOG message will look like -
> >
> > ERROR: conflict detected on relation "public.conf_tab":
> > conflict=multiple_unique_conflicts
> > DETAIL: The remote tuple violates multiple unique constraints on the
> > local table.
> > Key already exists in unique index "conf_tab_pkey", modified
> > locally in transaction 757 at 2025-02-25 14:00:56.955403+05:30.
> > Key (a)=(2); existing local tuple (2, 2, 2); remote tuple (2, 3, 4).
> > Key already exists in unique index "conf_tab_b_key", modified
> > locally in transaction 758 at 2025-02-25 14:00:56.957092+05:30.
> > Key (b)=(3); existing local tuple (3, 3, 3); remote tuple (2, 3, 4).
> > Key already exists in unique index "conf_tab_c_key", modified
> > locally in transaction 759 at 2025-02-25 14:00:56.957337+05:30.
> > Key (c)=(4); existing local tuple (4, 4, 4); remote tuple (2, 3, 4).
>
> OK, but I think the "keep it simple" approach hides all the
> interesting conflict details, which means you can't even tell if the
> log emits all the expected conflicts or not. Changing the test regex
> to match all those local/remove tuple details in the log would be more
> useful.
>
Fixed. Added regex to match the conflicting index, key, local tuple
and remote tuple.
> BTW, when I looked at the
> 'tmp_check/log/035_multiple_unique_conflicts_subscriber.log' I didn't
> understand why there are there 3 separate
> "conflict=multiple_unique_conflicts" ERRORs in the log. Since, AFAICT
> in the TAP file, there are only 2 failure test cases (e.g. due to
> INSERT, and due to UPDATE). Can you explain the difference?
>
Thanks for pointing it out. There was a log offset issue—the test
waited for the LOG entry after the apply worker errored out and
retried the conflict. I've fixed it now.
Attached the v3 patch with the above comments addressed. I've also
optimized the test case and removed unnecessary code.
--
Thanks,
Nisha Moond
Attachment | Content-Type | Size |
---|---|---|
v3-0001-Implement-the-conflict-detection-for-multiple_uni.patch | application/octet-stream | 22.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2025-03-12 09:50:13 | Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints |
Previous Message | Álvaro Herrera | 2025-03-12 09:41:43 | Re: pgsql: reindexdb: Add the index-level REINDEX with multiple jobs |