Re: Conflict detection for update_deleted in logical replication

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Conflict detection for update_deleted in logical replication
Date: 2025-01-08 08:44:29
Message-ID: CAD21AoBMFqpOiypaj2oPPFqdMtattUzryEquniMUXeg2JBRNbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 7, 2025 at 2:49 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Jan 6, 2025 at 4:52 PM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Friday, January 3, 2025 2:36 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > >
> > > + /*
> > > + * The changes made by this and later transactions are still
> > > non-removable
> > > + * to allow for the detection of update_deleted conflicts when
> > > applying
> > > + * changes in this logical replication worker.
> > > + *
> > > + * Note that this info cannot directly protect dead tuples from being
> > > + * prematurely frozen or removed. The logical replication launcher
> > > + * asynchronously collects this info to determine whether to advance
> > > the
> > > + * xmin value of the replication slot.
> > > + *
> > > + * Therefore, FullTransactionId that includes both the
> > > transaction ID and
> > > + * its epoch is used here instead of a single Transaction ID. This is
> > > + * critical because without considering the epoch, the transaction ID
> > > + * alone may appear as if it is in the future due to transaction ID
> > > + * wraparound.
> > > + */
> > > + FullTransactionId oldest_nonremovable_xid;
> > >
> > > The last paragraph of the comment mentions that we need to use
> > > FullTransactionId to properly compare XIDs even after the XID wraparound
> > > happens. But once we set the oldest-nonremovable-xid it prevents XIDs from
> > > being wraparound, no? I mean that workers'
> > > oldest-nonremovable-xid values and slot's non-removal-xid (i.e., its
> > > xmin) are never away from more than 2^31 XIDs.
> >
> > I think the issue is that the launcher may create the replication slot after
> > the apply worker has already set the 'oldest_nonremovable_xid' because the
> > launcher are doing that asynchronously. So, Before the slot is created, there's
> > a window where transaction IDs might wrap around. If initially the apply worker
> > has computed a candidate_xid (755) and the xid wraparound before the launcher
> > creates the slot, causing the new current xid to be (740), then the old
> > candidate_xid(755) looks like a xid in the future, and the launcher could
> > advance the xmin to 755 which cause the dead tuples to be removed prematurely.
> > (We are trying to reproduce this to ensure that it's a real issue and will
> > share after finishing)
> >
> > We thought of another approach, which is to create/drop this slot first as
> > soon as one enables/disables detect_update_deleted (E.g. create/drop slot
> > during DDL). But it seems complicate to control the concurrent slot
> > create/drop. For example, if one backend A enables detect_update_deteled, it
> > will create a slot. But if another backend B is disabling the
> > detect_update_deteled at the same time, then the newly created slot may be
> > dropped by backend B. I thought about checking the number of subscriptions that
> > enables detect_update_deteled before dropping the slot in backend B, but the
> > subscription changes caused by backend A may not visable yet (e.g. not
> > committed yet).
> >
>
> This means that for the transaction whose changes are not yet visible,
> we may have already created the slot and the backend B would end up
> dropping it. Is it possible that during the change of this new option
> via DDL, we take AccessExclusiveLock on pg_subscription as we do in
> DropSubscription() to ensure that concurrent transactions can't drop
> the slot? Will that help in solving the above scenario?

If we create/stop the slot during DDL, how do we support rollback DDLs?

>
> The second idea could be that each worker first checks whether a slot
> exists along with a subscription flag (new option). Checking the
> existence of a slot each time would be costly, so we somehow need to
> cache it. But if we do that then we need to invent some cache
> invalidation mechanism for the slot. I am not sure if we can design a
> race-free mechanism for that. I mean we need to think of a solution
> for race conditions between the launcher and apply workers to ensure
> that after dropping the slot, launcher doesn't recreate the slot (say
> if some subscription enables this option) before all the workers can
> clear their existing values of oldest_nonremovable_xid.
>
> The third idea to avoid the race condition could be that in the
> function InitializeLogRepWorker() after CommitTransactionCommand(), we
> check if the retain_dead_tuples flag is true for MySubscription then
> we check whether the system slot exists. If exits then go ahead,
> otherwise, wait till the slot is created. It could be some additional
> cycles during worker start up but it is a one-time effort and that too
> only when the flag is set. In addition to this, we anyway need to
> create the slot in the launcher before launching the workers, and
> after re-reading the subscription, the change in retain_dead_tuples
> flag (off->on) should cause the worker restart.
>
> Now, in the third idea, the issue can still arise if, after waiting
> for the slot to be created, the user sets the retain_dead_tuples to
> false and back to true again immediately. Because the launcher may
> have noticed the "retain_dead_tuples=false" operation and dropped the
> slot, while the apply worker has not noticed and still holds an old
> candidate_xid. The xid may wraparound in this window before setting
> the retain_dead_tuples back to true. And, the apply worker would not
> restart because after it calls maybe_reread_subscription(), the
> retain_dead_tuples would have been set back to true again. Again, to
> avoid this race condition, the launcher can wait for each worker to
> reset the oldest_nonremovamble_xid before dropping the slot.
>
> Even after doing the above, the third idea could still have another
> race condition:
> 1. The launcher creates the replication slot and starts a worker with
> retain_dead_tuples = true, the worker is waiting for publish status
> and has not set oldest_nonremovable_xid.
> 2. The user set the option retain_dead_tuples to false, the launcher
> noticed that and drop the replication slot.
> 3. The worker received the status and set oldest_nonremovable_xid to a
> valid value (say 750).
> 4. Xid wraparound happened at this point and say new_available_xid becomes 740
> 5. User set retain_dead_tuples = true again.
>
> After the above steps, the apply worker holds an old
> oldest_nonremovable_xid (750) and will not restart if it does not call
> maybe_reread_subscription() before step 5. So, such a case can again
> create a problem of incorrect slot->xmin value. We can probably try to
> find some way to avoid this race condition as well but I haven't
> thought more about this as this idea sounds a bit risky and bug-prone
> to me.
>
> Among the above ideas, the first idea of taking AccessExclusiveLock on
> pg_subscription sounds safest to me. I haven't evaluated the changes
> for the first approach so I could be missing something that makes it
> difficult to achieve but I think it is worth investigating unless we
> have better ideas or we think that the current approach used in patch
> to use FullTransactionId is okay.

Thank you for considering some ideas. As I mentioned above, we might
need to consider a case like where 'CREATE SUBSCRIPTION ..
(retain_conflict_info = true)' is rolled back. Having said that, this
comment is just for simplifying the logic. If using TransactionId
instead makes other parts complex, it would not make sense. I'm okay
with leaving this part and improving the comment for
oldest_nonremovable_xid, say, by mentioning that there is a window for
XID wraparound happening between workers computing their
oldst_nonremovable_xid and pg_conflict_detection slot being created.

BTW while reviewing the code, I realized that changing
retain_conflict_info value doesn't have the worker relaunch and we
don't clear the worker's oldest_nonremovable_xid value in this case.
Is it okay? I'm concerned about a case like where
RetainConflictInfoPhase state transition is paused by disabling
retain_conflict_info and resume by re-enabling it with an old
RetainConflictInfoData value.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2025-01-08 08:54:47 Re: Conflict detection for update_deleted in logical replication
Previous Message cca5507 2025-01-08 08:43:32 Fix a wrong errmsg in AlterRole()