RE: Conflict detection and logging in logical replication

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Subject: RE: Conflict detection and logging in logical replication
Date: 2024-08-13 04:39:15
Message-ID: OS0PR01MB57163332C43702CCA192A96694862@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Monday, August 12, 2024 7:41 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Aug 9, 2024 at 12:29 PM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > Here is the V12 patch that improved the log format as discussed.
> >
>
> Review comments:

Thanks for the comments.

> ===============
> 1. The patch doesn't display the remote tuple for delete_differ case.
> However, it shows the remote tuple correctly for update_differ. Is
> there a reason for the same? See below messages:
>
> update_differ:
> --------------
> LOG: conflict detected on relation "public.t1": conflict=update_differ
> DETAIL: Updating the row containing (c1)=(1) that was modified
> locally in transaction 806 at 2024-08-12 11:48:14.970002+05:30.
> Existing local tuple (1, 3, arun ); remote tuple (1, 3,
> ajay ).
> ...
>
> delete_differ
> --------------
> LOG: conflict detected on relation "public.t1": conflict=delete_differ
> DETAIL: Deleting the row containing (c1)=(1) that was modified by
> locally in transaction 809 at 2024-08-12 14:15:41.966467+05:30.
> Existing local tuple (1, 3, arun ).
>
> Note this happens when the publisher table has a REPLICA IDENTITY FULL
> and the subscriber table has primary_key. It would be better to keep
> the messages consistent. One possibility is that we remove
> key/old_tuple from the first line of the DETAIL message and display it
> in the second line as Existing local tuple <local_tuple>; remote tuple
> <..>; key <...>

Agreed. I thought that in delete_differ/missing cases, the remote tuple is covered
In the key values in the first sentence. To be consistent, I have moved the column-values
from the first sentence to the second sentence including the insert_exists conflict.

The new format looks like:

LOG: xxx
DETAIL: Key %s; existing local tuple %s; remote new tuple %s; replica identity %s

The Key will include the conflicting key for xxx_exists conflicts. And the replica identity part
will include the replica identity keys or the full tuple value in replica identity FULL case.

>
> 2. Similar to above, the remote tuple is not displayed in
> delete_missing but displayed in updated_missing type of conflict. If
> we follow the style mentioned in the previous point then the DETAIL
> message: "DETAIL: Did not find the row containing (c1)=(1) to be
> updated." can also be changed to: "DETAIL: Could not find the row to
> be updated." followed by other detail.

Same as above.

> 3. The detail of insert_exists is confusing.
>
> ERROR: conflict detected on relation "public.t1": conflict=insert_exists
> DETAIL: Key (c1)=(1) already exists in unique index "t1_pkey", which
> was modified locally in transaction 802 at 2024-08-12
> 11:11:31.252148+05:30.
>
> It sounds like the key value "(c1)=(1)" in the index is modified. How
> about changing slightly as: "Key (c1)=(1) already exists in unique
> index "t1_pkey", modified locally in transaction 802 at 2024-08-12
> 11:11:31.252148+05:30."? Feel free to propose if anything better comes
> to your mind.

The suggested message looks good to me.

>
> 4.
> if (localorigin == InvalidRepOriginId)
> + appendStringInfo(&err_detail, _("Deleting the row containing %s that
> was modified by locally in transaction %u at %s."),
> + val_desc, localxmin, timestamptz_to_str(localts));
>
> Typo in the above message. /modified by locally/modified locally

Fixed.

>
> 5.
> @@ -2661,6 +2662,29 @@ apply_handle_update_internal(ApplyExecutionData
> *edata,
> {
> ...
> found = FindReplTupleInLocalRel(edata, localrel,
> &relmapentry->remoterel,
> localindexoid,
> remoteslot, &localslot);
> ...
> ...
> +
> + ReportApplyConflict(LOG, CT_UPDATE_DIFFER, relinfo,
> + GetRelationIdentityOrPK(localrel),
>
> To find the tuple, we may have used an index other than Replica
> Identity or PK (see IsIndexUsableForReplicaIdentityFull), but while
> reporting conflict we don't consider such an index. I think the reason
> is that such an index scan wouldn't have resulted in a unique tuple
> and that is why we always compare the complete tuple in such cases. Is
> that the reason? Can we write a comment to make it clear?

Added comments atop of ReportApplyConflict for the 'indexoid' parameter.

>
> 6.
> void ReportApplyConflict(int elevel, ConflictType type,
> + ResultRelInfo *relinfo, Oid indexoid,
> + TransactionId localxmin,
> + RepOriginId localorigin,
> + TimestampTz localts,
> + TupleTableSlot *searchslot,
> + TupleTableSlot *localslot,
> + TupleTableSlot *remoteslot,
> + EState *estate);
>
> The prototype looks odd with pointers and non-pointer variables in
> mixed order. How about arranging parameters in the following order:
> Estate, ResultRelInfo, TupleTableSlot *searchslot, TupleTableSlot
> *localslot, TupleTableSlot *remoteslot, Oid indexoid, TransactionId
> localxmin, RepOriginId localorigin, TimestampTz localts?
>
> 7. Like above, check the parameters of other functions like
> errdetail_apply_conflict, build_index_value_desc,
> build_tuple_value_details, etc.

Changed as suggested.

Here is V13 patch set which addressed above comments.

Best Regards,
Hou zj

Attachment Content-Type Size
v13-0001-Detect-and-log-conflicts-in-logical-replication.patch application/octet-stream 62.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ayush Tiwari 2024-08-13 05:37:39 Drop database command will raise "wrong tuple length" if pg_database tuple contains toast attribute.
Previous Message Peter Eisentraut 2024-08-13 04:29:22 Re: Rename C23 keyword