RE: Conflict detection and logging in logical replication

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Subject: RE: Conflict detection and logging in logical replication
Date: 2024-08-18 08:56:52
Message-ID: OS0PR01MB5716D889B648844BB405D80694832@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, August 16, 2024 5:25 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Fri, Aug 16, 2024 at 12:19 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> >
> > On Fri, Aug 16, 2024 at 11:48 AM shveta malik <shveta(dot)malik(at)gmail(dot)com>
> wrote:
> > >
> > > On Fri, Aug 16, 2024 at 10:46 AM shveta malik <shveta(dot)malik(at)gmail(dot)com>
> wrote:
> > > >
> > > > 3)
> > > > For update_exists(), we dump:
> > > > Key (a, b)=(2, 1)
> > > >
> > > > For delete_missing, update_missing, update_differ, we dump:
> > > > Replica identity (a, b)=(2, 1).
> > > >
> > > > For update_exists as well, shouldn't we dump 'Replica identity'?
> > > > Only for insert case, it should be referred as 'Key'.
> > > >
> > >
> > > On rethinking, is it because for update_exists case 'Key' dumped is
> > > not the one used to search the row to be updated? Instead it is the
> > > one used to search the conflicting row. Unlike update_differ, the
> > > row to be updated and the row currently conflicting will be
> > > different for update_exists case. I earlier thought that 'KEY' and
> > > 'Existing local tuple' dumped always belong to the row currently
> > > being updated/deleted/inserted. But for 'update_eixsts', that is not
> > > the case. We are dumping 'Existing local tuple' and 'Key' for the
> > > row which is conflicting and not the one being updated. Example:
> > >
> > > ERROR: conflict detected on relation "public.tab_1":
> > > conflict=update_exists Key (a, b)=(2, 1); existing local tuple (2, 1); remote
> tuple (2, 1).
> > >
> > > Operations performed were:
> > > Pub: insert into tab values (1,1);
> > > Sub: insert into tab values (2,1);
> > > Pub: update tab set a=2 where a=1;
> > >
> > > Here Key and local tuple are both 2,1 instead of 1,1. While replica
> > > identity value (used to search original row) will be 1,1 only.
> > >
> > > It may be slightly confusing or say tricky to understand when
> > > compared to other conflicts' LOGs. But not sure what better we can do
> here.
> > >
> >
> > The update_exists behaves more like insert_exists as we detect that
> > only while inserting into index. It is also not clear to me if we can
> > do better than to clarify this in docs.
> >
>
> Instead of 'existing local tuple', will it be slightly better to have 'conflicting local
> tuple'?

I am slightly not sure about adding one more variety to describe the "existing
local tuple". I think we’d better use a consistent word. But if others feel otherwise,
I can change it in next version.

>
> Few trivial comments:
>
> 1)
> errdetail_apply_conflict() header says:
>
> * 2. Display of conflicting key, existing local tuple, remote new tuple, and
> * replica identity columns, if any.
>
> We may mention that existing *conflicting* local tuple.

Like above, I think that would duplicate the "existing local tuple" word.

>
> Looking at build_tuple_value_details(), the cases where we display 'KEY 'and
> the ones where we display 'replica identity' are mutually exclusives (we have
> ASSERTs like that). Shall we add this info in
> header that either Key or 'replica identity' is displayed. Or if we
> don't want to make it mutually exclusive then update_exists is one such casw
> where we can have both Key and 'Replica Identity cols'.

I think it’s fine to display replica identity for update_exists, so added.

>
>
> 2)
> BuildIndexValueDescription() header comment says:
>
> * This is currently used
> * for building unique-constraint, exclusion-constraint and logical replication
> * tuple missing conflict error messages
>
> Is it being used only for 'tuple missing conflict' flow? I thought, it will be hit for
> other flows as well.

Removed the "tuple missing".

Attach the V16 patch which addressed the comments we agreed on.
I will add a doc patch to explain the log format after the 0001 is RFC.

Best Regards,
Hou zj

Attachment Content-Type Size
v16-0001-Detect-and-log-conflicts-in-logical-replication.patch application/octet-stream 64.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2024-08-18 12:00:00 Re: Remove dependence on integer wrapping
Previous Message Zhijie Hou (Fujitsu) 2024-08-18 08:49:37 RE: Conflict detection and logging in logical replication