Re: Conflict detection and logging in logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Subject: Re: Conflict detection and logging in logical replication
Date: 2024-07-26 11:33:41
Message-ID: CAA4eK1JkFT9w9eQ+GwfQxJBNLwUaAzhR7L=b-=waWXfPnvR+kQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 25, 2024 at 4:12 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Jul 25, 2024 at 12:04 PM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > Here is the V6 patch set which addressed Shveta and Nisha's comments
> > in [1][2][3][4].
> >
>
> Do we need an option detect_conflict for logging conflicts? The
> possible reason to include such an option is to avoid any overhead
> during apply due to conflict detection. IIUC, to detect some of the
> conflicts like update_differ and delete_differ, we would need to fetch
> commit_ts information which could be costly but we do that only when
> GUC track_commit_timestamp is enabled which would anyway have overhead
> on its own. Can we do performance testing to see how much additional
> overhead we have due to fetching commit_ts information during conflict
> detection?
>
> The other time we need to enquire commit_ts is to log the conflict
> detection information which is an ERROR path, so performance shouldn't
> matter in this case.
>
> In general, it would be good to enable conflict detection/logging by
> default but if it has overhead then we can consider adding this new
> option. Anyway, adding an option could be a separate patch (at least
> for review), let the first patch be the core code of conflict
> detection and logging.
>
> minor cosmetic comments:
> 1.
> +static void
> +check_conflict_detection(void)
> +{
> + if (!track_commit_timestamp)
> + ereport(WARNING,
> + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("conflict detection could be incomplete due to disabled
> track_commit_timestamp"),
> + errdetail("Conflicts update_differ and delete_differ cannot be
> detected, and the origin and commit timestamp for the local row will
> not be logged."));
> +}
>
> The errdetail string is too long. It would be better to split it into
> multiple rows.
>
> 2.
> -
> +static void check_conflict_detection(void);
>
> Spurious line removal.
>

A few more comments:
1.
For duplicate key, the patch reports conflict as following:
ERROR: conflict insert_exists detected on relation "public.t1"
2024-07-26 11:06:34.570 IST [27800] DETAIL: Key (c1)=(1) already
exists in unique index "t1_pkey", which was modified by origin 1 in
transaction 770 at 2024-07-26 09:16:47.79805+05:30.
2024-07-26 11:06:34.570 IST [27800] CONTEXT: processing remote data
for replication origin "pg_16387" during message type "INSERT" for
replication target relation "public.t1" in transaction 742, finished
at 0/151A108

In detail, it is better to display the origin name instead of the
origin id. This will be similar to what we do in CONTEXT information.

2.
if (resultRelInfo->ri_NumIndices > 0)
recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
- slot, estate, false, false,
- NULL, NIL, false);
+ slot, estate, false,
+ conflictindexes, &conflict,

It is better to use true/false for the bool parameter (something like
conflictindexes ? true : false). That will make the code easier to
follow.

3. The need for ReCheckConflictIndexes() is not clear from comments.
Can you please add a few comments to explain this?

4.
- will simply be skipped.
+ will simply be skipped. Please refer to <link
linkend="sql-createsubscription-params-with-detect-conflict"><literal>detect_conflict</literal></link>
+ for all the conflicts that will be logged when enabling
<literal>detect_conflict</literal>.
</para>

It would be easier to read the patch if you move <link .. to the next line.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-07-26 11:41:45 Re: Allow logical failover slots to wait on synchronous replication
Previous Message Noah Misch 2024-07-26 11:29:58 Re: Built-in CTYPE provider