Re: Conflict detection and logging in logical replication

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <jan(at)wi3ck(dot)info>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Subject: Re: Conflict detection and logging in logical replication
Date: 2024-07-29 10:59:11
Message-ID: CAFiTN-ufwX5EqL42v-esjXE_Put709LdDPphj2mRa6NG=p7btg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 29, 2024 at 11:44 AM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>

I was going through v7-0001, and I have some initial comments.

@@ -536,11 +542,9 @@ ExecCheckIndexConstraints(ResultRelInfo
*resultRelInfo, TupleTableSlot *slot,
ExprContext *econtext;
Datum values[INDEX_MAX_KEYS];
bool isnull[INDEX_MAX_KEYS];
- ItemPointerData invalidItemPtr;
bool checkedIndex = false;

ItemPointerSetInvalid(conflictTid);
- ItemPointerSetInvalid(&invalidItemPtr);

/*
* Get information from the result relation info structure.
@@ -629,7 +633,7 @@ ExecCheckIndexConstraints(ResultRelInfo
*resultRelInfo, TupleTableSlot *slot,

satisfiesConstraint =
check_exclusion_or_unique_constraint(heapRelation, indexRelation,
- indexInfo, &invalidItemPtr,
+ indexInfo, &slot->tts_tid,
values, isnull, estate, false,
CEOUC_WAIT, true,
conflictTid);

What is the purpose of this change? I mean
'check_exclusion_or_unique_constraint' says that 'tupleid'
should be invalidItemPtr if the tuple is not yet inserted and
ExecCheckIndexConstraints is called by ExecInsert
before inserting the tuple. So what is this change? Would this change
ExecInsert's behavior as well?

----
----

+ReCheckConflictIndexes(ResultRelInfo *resultRelInfo, EState *estate,
+ ConflictType type, List *recheckIndexes,
+ TupleTableSlot *slot)
+{
+ /* Re-check all the unique indexes for potential conflicts */
+ foreach_oid(uniqueidx, resultRelInfo->ri_onConflictArbiterIndexes)
+ {
+ TupleTableSlot *conflictslot;
+
+ if (list_member_oid(recheckIndexes, uniqueidx) &&
+ FindConflictTuple(resultRelInfo, estate, uniqueidx, slot, &conflictslot))
+ {
+ RepOriginId origin;
+ TimestampTz committs;
+ TransactionId xmin;
+
+ GetTupleCommitTs(conflictslot, &xmin, &origin, &committs);
+ ReportApplyConflict(ERROR, type, resultRelInfo->ri_RelationDesc, uniqueidx,
+ xmin, origin, committs, conflictslot);
+ }
+ }
+}
and

+ conflictindexes = resultRelInfo->ri_onConflictArbiterIndexes;
+
if (resultRelInfo->ri_NumIndices > 0)
recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
- slot, estate, false, false,
- NULL, NIL, false);
+ slot, estate, false,
+ conflictindexes ? true : false,
+ &conflict,
+ conflictindexes, false);
+
+ /*
+ * Rechecks the conflict indexes to fetch the conflicting local tuple
+ * and reports the conflict. We perform this check here, instead of
+ * perform an additional index scan before the actual insertion and
+ * reporting the conflict if any conflicting tuples are found. This is
+ * to avoid the overhead of executing the extra scan for each INSERT
+ * operation, even when no conflict arises, which could introduce
+ * significant overhead to replication, particularly in cases where
+ * conflicts are rare.
+ */
+ if (conflict)
+ ReCheckConflictIndexes(resultRelInfo, estate, CT_INSERT_EXISTS,
+ recheckIndexes, slot);

This logic is confusing, first, you are calling
ExecInsertIndexTuples() with no duplicate error for the indexes
present in 'ri_onConflictArbiterIndexes' which means
the indexes returned by the function must be a subset of
'ri_onConflictArbiterIndexes' and later in ReCheckConflictIndexes()
you are again processing all the
indexes of 'ri_onConflictArbiterIndexes' and checking if any of these
is a subset of the indexes that is returned by
ExecInsertIndexTuples().

Why are we doing that, I think we can directly use the
'recheckIndexes' which is returned by ExecInsertIndexTuples(), and
those indexes are guaranteed to be a subset of
ri_onConflictArbiterIndexes. No?

---
---

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2024-07-29 11:18:40 Remove dead code generation tools in src/backend/utils/mb/
Previous Message Aleksander Alekseev 2024-07-29 10:55:37 Re: [PATCH] Add crc32(text) & crc32(bytea)