Re: Conflict detection and logging in logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Subject: Re: Conflict detection and logging in logical replication
Date: 2024-07-29 04:01:07
Message-ID: CAA4eK1K3Jz1odUtrX_xJg1gddomVcv6v25GD9SP3_J99YM1qww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 26, 2024 at 4:28 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Fri, Jul 26, 2024 at 3:56 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
>
> > One more thing we need to consider is whether we should LOG or ERROR
> > for update/delete_differ conflicts. If we LOG as the patch is doing
> > then we are intentionally overwriting the row when the user may not
> > expect it. OTOH, without a patch anyway we are overwriting, so there
> > is an argument that logging by default is what the user will expect.
> > What do you think?
>
> I was under the impression that in this patch we do not intend to
> change behaviour of HEAD and thus only LOG the conflict wherever
> possible.
>

Earlier, I thought it was good to keep LOGGING the conflict where
there is no chance of wrong data update but for cases where we can do
something wrong, it is better to ERROR out. For example, for an
update_differ case where the apply worker can overwrite the data from
a different origin, it is better to ERROR out. I thought this case was
comparable to an existing ERROR case like a unique constraint
violation. But I see your point as well that one might expect the
existing behavior where we are silently overwriting the different
origin data. The one idea to address this concern is to suggest users
set the detect_conflict subscription option as off but I guess that
would make this feature unusable for users who don't want to ERROR out
for different origin update cases.

> And in the next patch of resolver, based on the user's input
> of error/skip/or resolve, we take the action. I still think it is
> better to stick to the said behaviour. Only if we commit the resolver
> patch in the same version where we commit the detection patch, then we
> can take the risk of changing this default behaviour to 'always
> error'. Otherwise users will be left with conflicts arising but no
> automatic way to resolve those. But for users who really want their
> application to error out, we can provide an additional GUC in this
> patch itself which changes the behaviour to 'always ERROR on
> conflict'.
>

I don't see a need of GUC here, even if we want we can have a
subscription option such conflict_log_level. But users may want to
either LOG or ERROR based on conflict type. For example, there won't
be any data inconsistency in two node replication for delete_missing
case as one is trying to delete already deleted data, so LOGGING such
a case should be sufficient whereas update_differ could lead to
different data on two nodes, so the user may want to ERROR out in such
a case.

We can keep the current behavior as default for the purpose of
conflict detection but can have a separate patch to decide whether to
LOG/ERROR based on conflict_type.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-07-29 04:46:17 Re: Flush pgstats file during checkpoints
Previous Message Alexander Lakhin 2024-07-29 04:00:00 dikkop failed the pg_combinebackupCheck/006_db_file_copy.pl test