From: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | RE: Conflict detection for update_deleted in logical replication |
Date: | 2025-01-15 11:54:19 |
Message-ID: | OS0PR01MB5716EFBF8F2EFD156342444794192@OS0PR01MB5716.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wednesday, January 15, 2025 4:51 PM Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com>
>
> On Wednesday, January 15, 2025 12:08 PM Amit Kapila
> <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> Hi,
>
> >
> > On Wed, Jan 15, 2025 at 5:57 AM Masahiko Sawada
> > <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Mon, Jan 13, 2025 at 8:39 PM Amit Kapila
> > > <amit(dot)kapila16(at)gmail(dot)com>
> > wrote:
> > > >
> > > > As of now, I can't think of a way to throttle the publisher when
> > > > the apply_worker lags. Basically, we need some way to throttle
> > > > (reduce the speed of backends) when the apply worker is lagging
> > > > behind a threshold margin. Can you think of some way? I thought if
> > > > one notices frequent invalidation of the launcher's slot due to
> > > > max_lag, then they can rebalance their workload on the publisher.
> > >
> > > I don't have any ideas other than invalidating the launcher's slot
> > > when the apply lag is huge. We can think of invalidating the
> > > launcher's slot for some reasons such as the replay lag, the age of
> > > slot's xmin, and the duration.
> > >
> >
> > Right, this is exactly where we are heading. I think we can add
> > reasons step-wise. For example, as a first step, we can invalidate the
> > slot due to replay LAG. Then, slowly, we can add other reasons as well.
> >
> > One thing that needs more discussion is the exact way to invalidate a
> > slot. I have mentioned a couple of ideas in my previous email which I
> > am writing
> > again: "If we just invalidate the slot, users can check the status of
> > the slot and need to disable/enable retain_conflict_info again to
> > start retaining the required information. This would be required
> > because we can't allow system slots (slots created
> > internally) to be created by users. The other way could be that
> > instead of invalidating the slot, we directly drop/re-create the slot
> > or increase its xmin. If we choose to advance the slot automatically
> > without user intervention, we need to let users know via LOG and or via
> information in the view."
>
> In the latest version, we implemented a simpler approach that allows the apply
> worker to directly advance the oldest_nonremovable_xid if the waiting time
> exceeds the newly introduced option's limit. I've named this option
> "max_conflict_retention_duration," as it aligns better with the conflict
> detection concept and the "retain_conflict_info" option.
>
> During the last phase (RCI_WAIT_FOR_LOCAL_FLUSH), the apply worker
> evaluates how much time it has spent waiting. If this duration exceeds the
> max_conflict_retention_duration, the worker directly advances the
> oldest_nonremovable_xid and logs a message indicating the forced
> advancement of the non-removable transaction ID.
>
> This approach is a bit like a time-based option that discussed before.
> Compared to the slot invalidation approach, this approach is simpler because
> we can avoid adding 1) new slot invalidation type due to apply lag, 2) new field
> lag_behind in shared memory (MyLogicalRepWorker) to indicate when the lag
> exceeds the limit, and 3) additional logic in the launcher to handle each
> worker's lag status.
>
> In the slot invalidation, user would be able to confirm if the current by checking
> if the slot in pg_replication_slot in invalidated or not, while in the simpler
> approach mentioned, user could only confirm that by checking the LOGs.
>
> What do you think ? If we prefer the slot invalidation approach, I can do that part
> in next version.
>
> >
> > > >
> > > > >
> > > > The max_lag idea sounds interesting for the case
> > > > > where the subscriber is much behind. Probably we can visit this
> > > > > idea as a new feature after completing this feature.
> > > > >
> > > >
> > > > Sure, but what will be our answer to users for cases where the
> > > > performance tanks due to bloat accumulation? The tests show that
> > > > once the apply_lag becomes large, it becomes almost impossible for
> > > > the apply worker to catch up (or take a very long time) and
> > > > advance the slot's xmin. The users can disable
> > > > retain_conflict_info to bring back the performance and get rid of
> > > > bloat but I thought it would be easier for users to do that if we
> > > > have some knob where they don't need to wait till actually the problem of
> bloat/performance dip happens.
> > >
> > > Probably retaining dead tuples based on the time duration or its age
> > > might be other solutions, it would increase a risk of not being able
> > > to detect update_deleted conflict though. I think in any way as long
> > > as we accumulate dead tulpes to detect update_deleted conflicts, it
> > > would be a tradeoff between reliably detecting update_deleted
> > > conflicts and the performance.
> > >
> >
> > Right, and users have an option for it. Say, if they set max_lag as -1
> > (or some special value), we won't invalidate the slot, so the
> > update_delete conflict can be detected with complete reliability. At
> > this stage, it is okay if this information is LOGGED and displayed via
> > a system view. We need more thoughts while working on the CONFLICT
> > RESOLUTION patch such as we may need to additionally display a WARNING
> > or ERROR if the remote_tuples commit_time is earlier than the last
> > time slot is invalidated. I don't want to go in a detailed discussion
> > at this point but just wanted you to know that we will need additional
> > work for the resolution of update_delete conflicts to avoid inconsistency.
>
>
> Attach the V22 version patch set that includes the following changes:
>
> 1) merge the V21-0006 into main patches. Instead of reducing the maximum
> wait
> time to 10s, use 30s which is consistent with the wait in slotsync worker.
> 2) merge the V21-0007 into main patches. To avoid updating the flush too
> frequently for each change, it is updated at most per wal_writer_delay
> which
> is consistent with the existing logic in apply worker.
> 3) Add a new V21-0004 patch to introduce the
> "max_conflict_retention_duration" option
> mentioned above. Thank a lot for Kuroda-San for contributing codes in this
> patch.
CFbot reported a test error[1] which is because I missed to check if the
new option value is 0 before checking the time. Here is a new version
patch set to fix this issue.
[1] https://cirrus-ci.com/task/5635566210908160
Best Regards,
Hou zj
Attachment | Content-Type | Size |
---|---|---|
v23-0006-Support-the-conflict-detection-for-update_delete.patch | application/octet-stream | 25.7 KB |
v23-0001-Maintain-the-oldest-non-removeable-tranasction-I.patch | application/octet-stream | 40.7 KB |
v23-0002-Maintain-the-replication-slot-in-logical-launche.patch | application/octet-stream | 21.9 KB |
v23-0003-Add-a-retain_conflict_info-option-to-subscriptio.patch | application/octet-stream | 79.8 KB |
v23-0004-add-a-max_conflict_retention_duration-subscripti.patch | application/octet-stream | 70.4 KB |
v23-0005-Add-a-tap-test-to-verify-the-management-of-the-n.patch | application/octet-stream | 6.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Anton A. Melnikov | 2025-01-15 12:08:09 | Re: Change GUC hashtable to use simplehash? |
Previous Message | Jingtang Zhang | 2025-01-15 11:50:38 | Re: use a non-locking initial test in TAS_SPIN on AArch64 |