Re: Conflict detection and logging in logical replication

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Conflict detection and logging in logical replication
Date: 2024-07-10 09:39:17
Message-ID: CAJpy0uBtGwj+_8QTQZDi8ZW=eYTbRbK3Pt2MCMX2hzCxf==4Vw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 3, 2024 at 8:31 AM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Wednesday, June 26, 2024 10:58 AM Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
>
> Hi,
>
> As suggested by Sawada-san in another thread[1].
>
> I am attaching the V4 patch set which tracks the delete_differ
> conflict in logical replication.
>
> delete_differ means that the replicated DELETE is deleting a row
> that was modified by a different origin.
>

Thanks for the patch. I am still in process of review but please find
few comments:

1) When I try to *insert* primary/unique key on pub, which already
exists on sub, conflict gets detected. But when I try to *update*
primary/unique key to a value on pub which already exists on sub,
conflict is not detected. I get the error:

2024-07-10 14:21:09.976 IST [647678] ERROR: duplicate key value
violates unique constraint "t1_pkey"
2024-07-10 14:21:09.976 IST [647678] DETAIL: Key (pk)=(4) already exists.

This is because such conflict detection needs detection of constraint
violation using the *new value* rather than *existing* value during
UPDATE. INSERT conflict detection takes care of this case i.e. the
columns of incoming row are considered as new values and it tries to
see if all unique indexes are okay to digest such new values (all
incoming columns) but update's logic is different. It searches based
on oldTuple *only* and thus above detection is missing.

Shall we support such detection? If not, is it worth docuementing? It
basically falls in 'pkey_exists' conflict category but to user it
might seem like any ordinary update leading to 'unique key constraint
violation'.

2)
Another case which might confuse user:

CREATE TABLE t1 (pk integer primary key, val1 integer, val2 integer);

On PUB: insert into t1 values(1,10,10); insert into t1 values(2,20,20);

On SUB: update t1 set pk=3 where pk=2;

Data on PUB: {1,10,10}, {2,20,20}
Data on SUB: {1,10,10}, {3,20,20}

Now on PUB: update t1 set val1=200 where val1=20;

On Sub, I get this:
2024-07-10 14:44:00.160 IST [648287] LOG: conflict update_missing
detected on relation "public.t1"
2024-07-10 14:44:00.160 IST [648287] DETAIL: Did not find the row to
be updated.
2024-07-10 14:44:00.160 IST [648287] CONTEXT: processing remote data
for replication origin "pg_16389" during message type "UPDATE" for
replication target relation "public.t1" in transaction 760, finished
at 0/156D658

To user, it could be quite confusing, as val1=20 exists on sub but
still he gets update_missing conflict and the 'DETAIL' is not
sufficient to give the clarity. I think on HEAD as well (have not
tested), we will get same behavior i.e. update will be ignored as we
make search based on RI (pk in this case). So we are not worsening the
situation, but now since we are detecting conflict, is it possible to
give better details in 'DETAIL' section indicating what is actually
missing?

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-07-10 09:49:49 Re: Lock-free compaction. Why not?
Previous Message Tatsuo Ishii 2024-07-10 09:36:10 Re: Add memory/disk usage for WindowAgg nodes in EXPLAIN