Re: Improve error handling for invalid slots and ensure a same 'inactive_since' time for inactive slots

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: Improve error handling for invalid slots and ensure a same 'inactive_since' time for inactive slots
Date: 2025-01-29 08:56:24
Message-ID: CAA4eK1LvBbe_iaFQMSkG8Q0OmyELu0fKZ8_aUgeb6mXVFhgkOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 29, 2025 at 7:50 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Some review comments for patch v1-0001.
>
> ======
> src/backend/replication/logical/slotsync.c
>
> ReplSlotSyncWorkerMain:
>
> 1.
> + /* Use same inactive_since time for all slots */
> + now = GetCurrentTimestamp();
> +
> LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
>
> for (int i = 0; i < max_replication_slots; i++)
> @@ -1537,10 +1540,6 @@ update_synced_slots_inactive_since(void)
> /* The slot must not be acquired by any process */
> Assert(s->active_pid == 0);
>
> - /* Use the same inactive_since time for all the slots. */
> - if (now == 0)
> - now = GetCurrentTimestamp();
> -
>
> AFAICT, this code was *already* ensuring to use the same
> 'inactive_since' even before your patch. The only difference is now
> you are getting the timestamp value up-front instead of a deferred
> assignment.
>

I find the code without a patch better as it may sometimes skip to
call GetCurrentTimestamp().

> So why did you change this (and the code of RestoreSlotFromDisk) to do
> the up-front assignment? Instead, you could have chosen to just leave
> this code as-is, and then modify the RestoreSlotFromDisk code to match
> it.
>
> FWIW, I do prefer what you have done here because it is simpler, but I
> just wondered about the choice because I think some people worry about
> GetCurrentTimestamp overheads and try to avoid calling that wherever
> possible.
>
> ======
> src/backend/replication/slot.c
>
> 2. What about other loops?
>
> AFAICT there are still some other loops where the inactive_since
> timestamps might differ.
>
> e.g. How about this logic in slot.c:
>
> InvalidateObsoleteReplicationSlots:
>
> LOOP:
> for (int i = 0; i < max_replication_slots; i++)
> {
> ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
>
> calls InvalidatePossiblyObsoleteSlot(...)
> which calls ReplicationSlotRelease(...)
> which assigns now = GetCurrentTimestamp();
> then slot->inactive_since = now;
> }
>
> ~
>
> So, should you also assign a 'now' value outside this loop and pass
> that timestamp down the calls so they eventually all get assigned the
> same? I don't know, but I guess at least that would require much fewer
> unnecessary calls to GetCurrentTimestamp so that may be a good thing.
>

I don't see this as an optimization worth the effort of changing the
code. This gets called infrequently enough to matter. The same is true
for the code in RestoreSlotFromDisk().

So, overall, I think we should just reject the 0001 patch and focus on 0002.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2025-01-29 09:20:01 Re: CREATE PUBLICATION and foreign partitions
Previous Message Amit Kapila 2025-01-29 08:37:07 Re: Introduce XID age and inactive timeout based replication slot invalidation