From: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
---|---|
To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com> |
Subject: | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date: | 2024-03-26 06:34:26 |
Message-ID: | CAJpy0uCQb7Mt+975CHDbrObyWo+o855QRjaBw_KmZMLkS2PEFA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Mar 26, 2024 at 11:08 AM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Tue, Mar 26, 2024 at 9:30 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > On Mon, Mar 25, 2024 at 12:43 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > >
> > > I have one concern, for synced slots on standby, how do we disallow
> > > invalidation due to inactive-timeout immediately after promotion?
> > >
> > > For synced slots, last_inactive_time and inactive_timeout are both
> > > set. Let's say I bring down primary for promotion of standby and then
> > > promote standby, there are chances that it may end up invalidating
> > > synced slots (considering standby is not brought down during promotion
> > > and thus inactive_timeout may already be past 'last_inactive_time').
> > >
> >
> > On standby, if we decide to maintain valid last_inactive_time for
> > synced slots, then invalidation is correctly restricted in
> > InvalidateSlotForInactiveTimeout() for synced slots using the check:
> >
> > if (RecoveryInProgress() && slot->data.synced)
> > return false;
> >
> > But immediately after promotion, we can not rely on the above check
> > and thus possibility of synced slots invalidation is there. To
> > maintain consistent behavior regarding the setting of
> > last_inactive_time for synced slots, similar to user slots, one
> > potential solution to prevent this invalidation issue is to update the
> > last_inactive_time of all synced slots within the ShutDownSlotSync()
> > function during FinishWalRecovery(). This approach ensures that
> > promotion doesn't immediately invalidate slots, and henceforth, we
> > possess a correct last_inactive_time as a basis for invalidation going
> > forward. This will be equivalent to updating last_inactive_time during
> > restart (but without actual restart during promotion).
> > The plus point of maintaining last_inactive_time for synced slots
> > could be, this can provide data to the user on when last time the sync
> > was attempted on that particular slot by background slot sync worker
> > or SQl function. Thoughts?
>
> Please find the attached v21 patch implementing the above idea. It
> also has changes for renaming last_inactive_time to inactive_since.
>
Thanks for the patch. I have tested this patch alone, and it does what
it says. One additional thing which I noticed is that now it sets
inactive_since for temp slots as well, but that idea looks fine to me.
I could not test 'invalidation on promotion bug' with this change, as
that needed rebasing of the rest of the patches.
Few trivial things:
1)
Commti msg:
ensures the value is set to current timestamp during the
shutdown to help correctly interpret the time if the standby gets
promoted without a restart.
shutdown --> shutdown of slot sync worker (as it was not clear if it
is instance shutdown or something else)
2)
'The time since the slot has became inactive'.
has became-->has become
or just became
Please check it in all the files. There are multiple places.
thanks
Shveta
From | Date | Subject | |
---|---|---|---|
Next Message | Dean Rasheed | 2024-03-26 06:57:25 | Re: Functions to return random numbers in a given range |
Previous Message | Richard Guo | 2024-03-26 06:27:37 | Re: Propagate pathkeys from CTEs up to the outer query |