RE: Conflict detection for update_deleted in logical replication

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Conflict detection for update_deleted in logical replication
Date: 2025-01-15 08:50:49
Message-ID: OS0PR01MB5716A1E0AF0A3C4EEB2FED3994192@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, January 15, 2025 12:08 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

Hi,

>
> On Wed, Jan 15, 2025 at 5:57 AM Masahiko Sawada
> <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Mon, Jan 13, 2025 at 8:39 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > >
> > > As of now, I can't think of a way to throttle the publisher when the
> > > apply_worker lags. Basically, we need some way to throttle (reduce
> > > the speed of backends) when the apply worker is lagging behind a
> > > threshold margin. Can you think of some way? I thought if one
> > > notices frequent invalidation of the launcher's slot due to max_lag,
> > > then they can rebalance their workload on the publisher.
> >
> > I don't have any ideas other than invalidating the launcher's slot
> > when the apply lag is huge. We can think of invalidating the
> > launcher's slot for some reasons such as the replay lag, the age of
> > slot's xmin, and the duration.
> >
>
> Right, this is exactly where we are heading. I think we can add reasons
> step-wise. For example, as a first step, we can invalidate the slot due to replay
> LAG. Then, slowly, we can add other reasons as well.
>
> One thing that needs more discussion is the exact way to invalidate a slot. I
> have mentioned a couple of ideas in my previous email which I am writing
> again: "If we just invalidate the slot, users can check the status of the slot and
> need to disable/enable retain_conflict_info again to start retaining the required
> information. This would be required because we can't allow system slots (slots
> created
> internally) to be created by users. The other way could be that instead of
> invalidating the slot, we directly drop/re-create the slot or increase its xmin. If
> we choose to advance the slot automatically without user intervention, we need
> to let users know via LOG and or via information in the view."

In the latest version, we implemented a simpler approach that allows the apply
worker to directly advance the oldest_nonremovable_xid if the waiting time
exceeds the newly introduced option's limit. I've named this option
"max_conflict_retention_duration," as it aligns better with the conflict
detection concept and the "retain_conflict_info" option.

During the last phase (RCI_WAIT_FOR_LOCAL_FLUSH), the apply worker evaluates
how much time it has spent waiting. If this duration exceeds the
max_conflict_retention_duration, the worker directly advances the
oldest_nonremovable_xid and logs a message indicating the forced advancement of
the non-removable transaction ID.

This approach is a bit like a time-based option that discussed before.
Compared to the slot invalidation approach, this approach is simpler because we
can avoid adding 1) new slot invalidation type due to apply lag, 2) new field
lag_behind in shared memory (MyLogicalRepWorker) to indicate when the lag
exceeds the limit, and 3) additional logic in the launcher to handle each
worker's lag status.

In the slot invalidation, user would be able to confirm if the current by
checking if the slot in pg_replication_slot in invalidated or not, while in the
simpler approach mentioned, user could only confirm that by checking the LOGs.

What do you think ? If we prefer the slot invalidation approach, I can do that
part in next version.

>
> > >
> > > >
> > > The max_lag idea sounds interesting for the case
> > > > where the subscriber is much behind. Probably we can visit this
> > > > idea as a new feature after completing this feature.
> > > >
> > >
> > > Sure, but what will be our answer to users for cases where the
> > > performance tanks due to bloat accumulation? The tests show that
> > > once the apply_lag becomes large, it becomes almost impossible for
> > > the apply worker to catch up (or take a very long time) and advance
> > > the slot's xmin. The users can disable retain_conflict_info to bring
> > > back the performance and get rid of bloat but I thought it would be
> > > easier for users to do that if we have some knob where they don't
> > > need to wait till actually the problem of bloat/performance dip happens.
> >
> > Probably retaining dead tuples based on the time duration or its age
> > might be other solutions, it would increase a risk of not being able
> > to detect update_deleted conflict though. I think in any way as long
> > as we accumulate dead tulpes to detect update_deleted conflicts, it
> > would be a tradeoff between reliably detecting update_deleted
> > conflicts and the performance.
> >
>
> Right, and users have an option for it. Say, if they set max_lag as -1 (or some
> special value), we won't invalidate the slot, so the update_delete conflict can
> be detected with complete reliability. At this stage, it is okay if this information
> is LOGGED and displayed via a system view. We need more thoughts while
> working on the CONFLICT RESOLUTION patch such as we may need to
> additionally display a WARNING or ERROR if the remote_tuples commit_time is
> earlier than the last time slot is invalidated. I don't want to go in a detailed
> discussion at this point but just wanted you to know that we will need
> additional work for the resolution of update_delete conflicts to avoid
> inconsistency.

Attach the V22 version patch set that includes the following changes:

1) merge the V21-0006 into main patches. Instead of reducing the maximum wait
time to 10s, use 30s which is consistent with the wait in slotsync worker.
2) merge the V21-0007 into main patches. To avoid updating the flush too
frequently for each change, it is updated at most per wal_writer_delay which
is consistent with the existing logic in apply worker.
3) Add a new V21-0004 patch to introduce the "max_conflict_retention_duration" option
mentioned above. Thank a lot for Kuroda-San for contributing codes in this
patch.

Best Regards,
Hou zj

Attachment Content-Type Size
v22-0006-Support-the-conflict-detection-for-update_delete.patch application/octet-stream 25.7 KB
v22-0001-Maintain-the-oldest-non-removeable-tranasction-I.patch application/octet-stream 40.7 KB
v22-0002-Maintain-the-replication-slot-in-logical-launche.patch application/octet-stream 21.9 KB
v22-0003-Add-a-retain_conflict_info-option-to-subscriptio.patch application/octet-stream 79.8 KB
v22-0004-add-a-max_conflict_retention_duration-subscripti.patch application/octet-stream 69.5 KB
v22-0005-Add-a-tap-test-to-verify-the-management-of-the-n.patch application/octet-stream 6.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2025-01-15 08:58:45 Re: Non-text mode for pg_dumpall
Previous Message Bertrand Drouvot 2025-01-15 08:38:33 Re: Reorder shutdown sequence, to flush pgstats later