Re: Conflict detection for update_deleted in logical replication

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Conflict detection for update_deleted in logical replication
Date: 2025-01-15 00:27:00
Message-ID: CAD21AoBecViE36_+SSPV4i4Ex_HvPOK-MFwZbD-HEnDNc5=1TA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 13, 2025 at 8:39 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Jan 14, 2025 at 7:14 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Sun, Jan 12, 2025 at 10:36 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > I don't think we can avoid accumulating garbage especially when the
> > > workload on the publisher is more. Consider the current case being
> > > discussed, on the publisher, we have 30 clients performing read-write
> > > operations and there is only one pair of reader (walsender) and writer
> > > (apply_worker) to perform all those write operations on the
> > > subscriber. It can never match the speed and the subscriber side is
> > > bound to have less performance (or accumulate more bloat) irrespective
> > > of its workload. If there is one client on the publisher performing
> > > operation, we won't see much degradation but as the number of clients
> > > increases, the performance degradation (and bloat) will keep on
> > > increasing.
> > >
> > > There are other scenarios that can lead to the same situation, such as
> > > a large table sync, the subscriber node being down for sometime, etc.
> > > Basically, any case where apply_side lags by a large amount from the
> > > remote node.
> > >
> > > One idea to prevent the performance degradation or bloat increase is
> > > to invalidate the slot, once we notice that subscriber lags (in terms
> > > of WAL apply) behind the publisher by a certain threshold. Say we have
> > > max_lag (or max_lag_behind_remote) (defined in terms of seconds)
> > > subscription option which allows us to stop calculating
> > > oldest_nonremovable_xid for that subscription. We can indicate that
> > > via some worker_level parameter. Once all the subscriptions on a node
> > > that has enabled retain_conflict_info have stopped calculating
> > > oldest_nonremovable_xid, we can invalidate the slot. Now, users can
> > > check this and need to disable/enable retain_conflict_info to again
> > > start retaining the required information. The other way could be that
> > > instead of invalidating the slot, we directly drop/re-create the slot
> > > or increase its xmin. If we choose to advance the slot automatically
> > > without user intervention, we need to let users know via LOG and or
> > > via information in the view.
> > >
> > > I think such a mechanism via the new option max_lag will address your
> > > concern: "It's reasonable behavior for this approach but it might not
> > > be a reasonable outcome for users if they could be affected by such a
> > > performance dip without no way to avoid it." as it will provide a way
> > > to avoid performance dip only when there is a possibility of such a
> > > dip.
> > >
> > > I mentioned max_lag as a subscription option instead of a GUC because
> > > it applies only to subscriptions that have enabled
> > > retain_conflict_info but we can consider it to be a GUC if you and
> > > others think so provided the above proposal sounds reasonable. Also,
> > > max_lag could be defined in terms of LSN as well but I think time
> > > would be easy to configure.
> > >
> > > Thoughts?
> >
> > I agree that we cannot avoid accumulating dead tuples when the
> > workload on the publisher is more, and which affects the subscriber
> > performance. What we need to do is to update slot's xmin as quickly as
> > possible to minimize the dead tuple accumulation at least when the
> > subscriber is not much behind. If there is a tradeoff for doing so
> > (e.g., vs. the publisher performance), we need to provide a way for
> > users to balance it.
> >
>
> As of now, I can't think of a way to throttle the publisher when the
> apply_worker lags. Basically, we need some way to throttle (reduce the
> speed of backends) when the apply worker is lagging behind a threshold
> margin. Can you think of some way? I thought if one notices frequent
> invalidation of the launcher's slot due to max_lag, then they can
> rebalance their workload on the publisher.

I don't have any ideas other than invalidating the launcher's slot
when the apply lag is huge. We can think of invalidating the
launcher's slot for some reasons such as the replay lag, the age of
slot's xmin, and the duration.

>
> >
> The max_lag idea sounds interesting for the case
> > where the subscriber is much behind. Probably we can visit this idea
> > as a new feature after completing this feature.
> >
>
> Sure, but what will be our answer to users for cases where the
> performance tanks due to bloat accumulation? The tests show that once
> the apply_lag becomes large, it becomes almost impossible for the
> apply worker to catch up (or take a very long time) and advance the
> slot's xmin. The users can disable retain_conflict_info to bring back
> the performance and get rid of bloat but I thought it would be easier
> for users to do that if we have some knob where they don't need to
> wait till actually the problem of bloat/performance dip happens.

Probably retaining dead tuples based on the time duration or its age
might be other solutions, it would increase a risk of not being able
to detect update_deleted conflict though. I think in any way as long
as we accumulate dead tulpes to detect update_deleted conflicts, it
would be a tradeoff between reliably detecting update_deleted
conflicts and the performance.

As for detecting update_deleted conflicts, we probably don't need the
whole tuple data of deleted tuples. It would be sufficient if we can
check XIDs of deleted tuple to get their origins and commit
timestamps. Probably the same is true for the old version of updated
tuple in terms of detecting update_origin_differ conflicts. If my
understanding is right, probably we can remove only the tuple data of
dead tuples that are older than a xmin horizon (excluding the
launcher's xmin), while leaving the heap tuple header, which can
minimize the table bloat.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michail Nikolaev 2025-01-15 00:39:37 Re: Issue with markers in isolation tester? Or not?
Previous Message Tom Lane 2025-01-14 23:58:01 Re: pgbench error: (setshell) of script 0; execution of meta-command failed