Re: Conflict detection for update_deleted in logical replication

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Conflict detection for update_deleted in logical replication
Date: 2025-02-05 00:29:48
Message-ID: CAD21AoDuGu4UwpRDtrmGkvjYt+U3RBseU8uGTx6CYtCN8MMNuA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 31, 2025 at 9:07 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Sat, Feb 1, 2025 at 2:54 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Thu, Jan 30, 2025 at 10:39 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Jan 31, 2025 at 4:10 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > I have one question about the 0004 patch; it implemented
> > > > max_conflict_retntion_duration as a subscription parameter. But the
> > > > launcher invalidates the pg_conflict_detection slot only if all
> > > > subscriptions with retain_conflict_info stopped retaining dead tuples
> > > > due to the max_conflict_retention_duration parameter. Therefore, even
> > > > if users set the parameter to a low value to avoid table bloats, it
> > > > would not make sense if other subscriptions set it to a larger value.
> > > > Is my understanding correct?
> > > >
> > >
> > > Yes, your understanding is correct. I think this could be helpful
> > > during resolution because the worker for which the duration has
> > > exceeded cannot detect conflicts reliably but others can. So, this
> > > info can be useful while performing resolutions. Do you have an
> > > opinion/suggestion on this matter?
> >
> > I imagined a scenario like where two apply workers are running and
> > have different max_conflict_retention_duration values (say '5 min' and
> > '15 min'). Suppose both workers are roughly the same behind the
> > publisher(s), when both workers cannot advance the workers' xmin
> > values for 5 min or longer, one worker stops retaining dead tuples.
> > However, the pg_conflict_detection slot is not invalidated yet since
> > another worker is still using it, so both workers would continue to be
> > getting slower. The subscriber would end up retaining dead tuples
> > until both workers are behind for 15 min or longer, before
> > invalidating the slot. In this case, stopping dead tuple retention on
> > the first worker would help neither advance the slot's xmin nor
> > improve another worker's performance.
>
> Won't the same be true for 'retain_conflict_info' option as well? I
> mean even if one worker is retaining dead tuples, the performance of
> others will also be impacted.

I guess the situation might be a bit different. It's a user's choice
to disable retain_conflict_info, and it should be done manually. That
is, in this case, I think users will be able to figure out that both
apply workers are the same behind the publishers and they need to
disable retain_conflict_info on both subscriptions in order to remove
accumulated dead tuples (which is the cause of performance dip).

On the other hand, ISTM max_conflict_retentation_duration is something
like a switch to recover the system performance by automatically
disabling retain_conflict_info (and it will automatically go back to
be enabled again). I guess users who use the
max_conflict_retention_duration would expect that the system
performance will tend to recover by automatically disabling
reatin_conflict_info if the apply worker is lagging for longer than
the specified value. However, there are cases where this cannot be
expected.

>
> >
> > I was not sure of the point of
> > making the max_conflict_retention_duration a per-subscription
> > parameter.
> >
>
> The idea is to keep it at the same level as the other related
> parameter 'retain_conflict_info'. It could be useful for cases where
> publishers are from two different nodes (NP1 and NP2) and we have
> separate subscriptions for both nodes. Now, it is possible that users
> won't expect conflicts on the tables from one of the nodes NP1 then
> she could choose to enable 'retain_conflict_info' and
> 'max_conflict_retention_duration' only for the subscription pointing
> to publisher NP2.
>
> Now, say the publisher node that can generate conflicts (NP2) has
> fewer writes and the corresponding apply worker could easily catch up
> and almost always be in sync with the publisher. In contrast, the
> other node that has no conflicts has a large number of writes. In such
> cases, giving new options at the subscription level will be helpful.
>
> If we want to provide it at the global level, then the performance or
> dead tuple control may not be any better than the current patch but
> won't allow the provision for the above kinds of cases. Second, adding
> two new GUCs is another thing I want to prevent. But OTOH, the
> implementation could be slightly simpler if we provide these options
> as GUC though I am not completely sure of that point. Having said
> that, I am open to changing it to a non-subscription level. Do you
> think it would be better to provide one or both of these parameters as
> GUCs or do you have something else in mind?

It makes sense to me to have the retain_conflict_info as a
subscription-level parameter. I was thinking of making only
max_conflict_retention_duration a global parameter, but I might be
missing something. With a subscription-level
max_conflict_retention_duration, how can users choose the setting
values for each subscription, and is there a case that can be covered
only by a subscription-level max_conflict_retention_duration?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-02-05 00:36:00 Re: add missing PQfinish() calls to vacuumdb
Previous Message Peter Smith 2025-02-05 00:23:19 Re: Avoid updating inactive_since for invalid replication slots