RE: Conflict detection and logging in logical replication

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Subject: RE: Conflict detection and logging in logical replication
Date: 2024-07-11 02:17:17
Message-ID: OS0PR01MB57168F2C0DA21A8DAAA06AED94A52@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, July 10, 2024 5:39 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Wed, Jul 3, 2024 at 8:31 AM Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com>
> wrote:
> >
> > On Wednesday, June 26, 2024 10:58 AM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> >
> > Hi,
> >
> > As suggested by Sawada-san in another thread[1].
> >
> > I am attaching the V4 patch set which tracks the delete_differ
> > conflict in logical replication.
> >
> > delete_differ means that the replicated DELETE is deleting a row that
> > was modified by a different origin.
> >
>
> Thanks for the patch. I am still in process of review but please find few
> comments:

Thanks for the comments!

> 1) When I try to *insert* primary/unique key on pub, which already exists on
> sub, conflict gets detected. But when I try to *update* primary/unique key to a
> value on pub which already exists on sub, conflict is not detected. I get the
> error:
>
> 2024-07-10 14:21:09.976 IST [647678] ERROR: duplicate key value violates
> unique constraint "t1_pkey"
> 2024-07-10 14:21:09.976 IST [647678] DETAIL: Key (pk)=(4) already exists.

Yes, I think the detection of this conflict is not added with the
intention to control the size of the patch in the first version.

>
> This is because such conflict detection needs detection of constraint violation
> using the *new value* rather than *existing* value during UPDATE. INSERT
> conflict detection takes care of this case i.e. the columns of incoming row are
> considered as new values and it tries to see if all unique indexes are okay to
> digest such new values (all incoming columns) but update's logic is different.
> It searches based on oldTuple *only* and thus above detection is missing.

I think the logic is the same if we want to detect the unique violation
for UDPATE, we need to check if the new value of the UPDATE violates any
unique constraints same as the detection of insert_exists (e.g. check
the conflict around ExecInsertIndexTuples())

>
> Shall we support such detection? If not, is it worth docuementing?

I am personally OK to support this detection. And
I think it's already documented that we only detect unique violation for
insert which mean update conflict is not detected.

> 2)
> Another case which might confuse user:
>
> CREATE TABLE t1 (pk integer primary key, val1 integer, val2 integer);
>
> On PUB: insert into t1 values(1,10,10); insert into t1 values(2,20,20);
>
> On SUB: update t1 set pk=3 where pk=2;
>
> Data on PUB: {1,10,10}, {2,20,20}
> Data on SUB: {1,10,10}, {3,20,20}
>
> Now on PUB: update t1 set val1=200 where val1=20;
>
> On Sub, I get this:
> 2024-07-10 14:44:00.160 IST [648287] LOG: conflict update_missing detected
> on relation "public.t1"
> 2024-07-10 14:44:00.160 IST [648287] DETAIL: Did not find the row to be
> updated.
> 2024-07-10 14:44:00.160 IST [648287] CONTEXT: processing remote data for
> replication origin "pg_16389" during message type "UPDATE" for replication
> target relation "public.t1" in transaction 760, finished at 0/156D658
>
> To user, it could be quite confusing, as val1=20 exists on sub but still he gets
> update_missing conflict and the 'DETAIL' is not sufficient to give the clarity. I
> think on HEAD as well (have not tested), we will get same behavior i.e. update
> will be ignored as we make search based on RI (pk in this case). So we are not
> worsening the situation, but now since we are detecting conflict, is it possible
> to give better details in 'DETAIL' section indicating what is actually missing?

I think It's doable to report the row value that cannot be found in the local
relation, but the concern is the potential risk of exposing some
sensitive data in the log. This may be OK, as we are already reporting the
key value for constraints violation, so if others also agree, we can add
the row value in the DETAIL as well.

Best Regards,
Hou zj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Euler Taveira 2024-07-11 02:52:33 Re: improve performance of pg_dump with many sequences
Previous Message David Rowley 2024-07-11 01:16:30 Re: Parent/child context relation in pg_get_backend_memory_contexts()