Re: Conflict detection and logging in logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Subject: Re: Conflict detection and logging in logical replication
Date: 2024-08-12 11:40:47
Message-ID: CAA4eK1LrXQhoeN71w3WQsFQRFZ0EdtG8o++jxPwzAZUi1tKMEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 9, 2024 at 12:29 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> Here is the V12 patch that improved the log format as discussed.
>

Review comments:
===============
1. The patch doesn't display the remote tuple for delete_differ case.
However, it shows the remote tuple correctly for update_differ. Is
there a reason for the same? See below messages:

update_differ:
--------------
LOG: conflict detected on relation "public.t1": conflict=update_differ
DETAIL: Updating the row containing (c1)=(1) that was modified
locally in transaction 806 at 2024-08-12 11:48:14.970002+05:30.
Existing local tuple (1, 3, arun ); remote tuple (1, 3,
ajay ).
...

delete_differ
--------------
LOG: conflict detected on relation "public.t1": conflict=delete_differ
DETAIL: Deleting the row containing (c1)=(1) that was modified by
locally in transaction 809 at 2024-08-12 14:15:41.966467+05:30.
Existing local tuple (1, 3, arun ).

Note this happens when the publisher table has a REPLICA IDENTITY FULL
and the subscriber table has primary_key. It would be better to keep
the messages consistent. One possibility is that we remove
key/old_tuple from the first line of the DETAIL message and display it
in the second line as Existing local tuple <local_tuple>; remote tuple
<..>; key <...>

2. Similar to above, the remote tuple is not displayed in
delete_missing but displayed in updated_missing type of conflict. If
we follow the style mentioned in the previous point then the DETAIL
message: "DETAIL: Did not find the row containing (c1)=(1) to be
updated." can also be changed to: "DETAIL: Could not find the row to
be updated." followed by other detail.

3. The detail of insert_exists is confusing.

ERROR: conflict detected on relation "public.t1": conflict=insert_exists
DETAIL: Key (c1)=(1) already exists in unique index "t1_pkey", which
was modified locally in transaction 802 at 2024-08-12
11:11:31.252148+05:30.

It sounds like the key value "(c1)=(1)" in the index is modified. How
about changing slightly as: "Key (c1)=(1) already exists in unique
index "t1_pkey", modified locally in transaction 802 at 2024-08-12
11:11:31.252148+05:30."? Feel free to propose if anything better comes
to your mind.

4.
if (localorigin == InvalidRepOriginId)
+ appendStringInfo(&err_detail, _("Deleting the row containing %s that
was modified by locally in transaction %u at %s."),
+ val_desc, localxmin, timestamptz_to_str(localts));

Typo in the above message. /modified by locally/modified locally

5.
@@ -2661,6 +2662,29 @@ apply_handle_update_internal(ApplyExecutionData *edata,
{
...
found = FindReplTupleInLocalRel(edata, localrel,
&relmapentry->remoterel,
localindexoid,
remoteslot, &localslot);
...
...
+
+ ReportApplyConflict(LOG, CT_UPDATE_DIFFER, relinfo,
+ GetRelationIdentityOrPK(localrel),

To find the tuple, we may have used an index other than Replica
Identity or PK (see IsIndexUsableForReplicaIdentityFull), but while
reporting conflict we don't consider such an index. I think the reason
is that such an index scan wouldn't have resulted in a unique tuple
and that is why we always compare the complete tuple in such cases. Is
that the reason? Can we write a comment to make it clear?

6.
void ReportApplyConflict(int elevel, ConflictType type,
+ ResultRelInfo *relinfo, Oid indexoid,
+ TransactionId localxmin,
+ RepOriginId localorigin,
+ TimestampTz localts,
+ TupleTableSlot *searchslot,
+ TupleTableSlot *localslot,
+ TupleTableSlot *remoteslot,
+ EState *estate);

The prototype looks odd with pointers and non-pointer variables in
mixed order. How about arranging parameters in the following order:
Estate, ResultRelInfo, TupleTableSlot *searchslot, TupleTableSlot
*localslot, TupleTableSlot *remoteslot, Oid indexoid, TransactionId
localxmin, RepOriginId localorigin, TimestampTz localts?

7. Like above, check the parameters of other functions like
errdetail_apply_conflict, build_index_value_desc,
build_tuple_value_details, etc.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-08-12 11:42:18 Re: minor comments issue in ResultRelInfo src/include/nodes/execnodes.h
Previous Message cca5507 2024-08-12 11:38:53 Re: Historic snapshot doesn't track txns committed in BUILDING_SNAPSHOT state