From: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict detection for update_deleted in logical replication |
Date: | 2025-02-05 05:24:09 |
Message-ID: | CAFiTN-u6bzpbps+yLhB9B1ffatbr3AdieYEx98bP2O0By5B85w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Feb 1, 2025 at 10:37 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Sat, Feb 1, 2025 at 2:54 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Thu, Jan 30, 2025 at 10:39 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Jan 31, 2025 at 4:10 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > I have one question about the 0004 patch; it implemented
> > > > max_conflict_retntion_duration as a subscription parameter. But the
> > > > launcher invalidates the pg_conflict_detection slot only if all
> > > > subscriptions with retain_conflict_info stopped retaining dead tuples
> > > > due to the max_conflict_retention_duration parameter. Therefore, even
> > > > if users set the parameter to a low value to avoid table bloats, it
> > > > would not make sense if other subscriptions set it to a larger value.
> > > > Is my understanding correct?
> > > >
> > >
> > > Yes, your understanding is correct. I think this could be helpful
> > > during resolution because the worker for which the duration has
> > > exceeded cannot detect conflicts reliably but others can. So, this
> > > info can be useful while performing resolutions. Do you have an
> > > opinion/suggestion on this matter?
> >
> > I imagined a scenario like where two apply workers are running and
> > have different max_conflict_retention_duration values (say '5 min' and
> > '15 min'). Suppose both workers are roughly the same behind the
> > publisher(s), when both workers cannot advance the workers' xmin
> > values for 5 min or longer, one worker stops retaining dead tuples.
> > However, the pg_conflict_detection slot is not invalidated yet since
> > another worker is still using it, so both workers would continue to be
> > getting slower. The subscriber would end up retaining dead tuples
> > until both workers are behind for 15 min or longer, before
> > invalidating the slot. In this case, stopping dead tuple retention on
> > the first worker would help neither advance the slot's xmin nor
> > improve another worker's performance.
>
> Won't the same be true for 'retain_conflict_info' option as well? I
> mean even if one worker is retaining dead tuples, the performance of
> others will also be impacted.
+1
>
> >
> > I was not sure of the point of
> > making the max_conflict_retention_duration a per-subscription
> > parameter.
> >
>
> The idea is to keep it at the same level as the other related
> parameter 'retain_conflict_info'. It could be useful for cases where
> publishers are from two different nodes (NP1 and NP2) and we have
> separate subscriptions for both nodes. Now, it is possible that users
> won't expect conflicts on the tables from one of the nodes NP1 then
> she could choose to enable 'retain_conflict_info' and
> 'max_conflict_retention_duration' only for the subscription pointing
> to publisher NP2.
>
> Now, say the publisher node that can generate conflicts (NP2) has
> fewer writes and the corresponding apply worker could easily catch up
> and almost always be in sync with the publisher. In contrast, the
> other node that has no conflicts has a large number of writes. In such
> cases, giving new options at the subscription level will be helpful.
>
> If we want to provide it at the global level, then the performance or
> dead tuple control may not be any better than the current patch but
> won't allow the provision for the above kinds of cases. Second, adding
> two new GUCs is another thing I want to prevent. But OTOH, the
> implementation could be slightly simpler if we provide these options
> as GUC though I am not completely sure of that point. Having said
> that, I am open to changing it to a non-subscription level. Do you
> think it would be better to provide one or both of these parameters as
> GUCs or do you have something else in mind?
>
I agree with this analogy. It seems that
'max_conflict_retention_duration' is quite similar to
'retain_conflict_info'. In both cases, the slot for retaining dead
tuples is shared among all subscribers. However, these subscribers may
be receiving data from different publishers and even different nodes.
Therefore, the decision on whether to wait and for how long should be
made at the subscriber level.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2025-02-05 05:30:12 | Re: Show WAL write and fsync stats in pg_stat_io |
Previous Message | Shlok Kyal | 2025-02-05 05:22:41 | Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility. |