| From: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> | 
|---|---|
| To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> | 
| Cc: | shveta malik <shveta(dot)malik(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> | 
| Subject: | Re: Introduce XID age and inactive timeout based replication slot invalidation | 
| Date: | 2024-03-26 17:52:12 | 
| Message-ID: | ZgMLTBD3xJIP9W93@ip-10-97-1-34.eu-west-3.compute.internal | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hi,
On Tue, Mar 26, 2024 at 09:59:23PM +0530, Bharath Rupireddy wrote:
> On Tue, Mar 26, 2024 at 4:35 PM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > If we just sync inactive_since value for synced slots while in
> > recovery from the primary, so be it. Why do we need to update it to
> > the current time when the slot is being created? We don't expose slot
> > creation time, no? Aren't we fine if we just sync the value from
> > primary and document that fact? After the promotion, we can reset it
> > to the current time so that it gets its own time.
> 
> I'm attaching v24 patches. It implements the above idea proposed
> upthread for synced slots. I've now separated
> s/last_inactive_time/inactive_since and synced slots behaviour. Please
> have a look.
Thanks!
==== v24-0001
It's now pure mechanical changes and it looks good to me.
==== v24-0002
1 ===
    This commit does two things:
    1) Updates inactive_since for sync slots with the value
    received from the primary's slot.
Tested it and it does that.
2 ===
    2) Ensures the value is set to current timestamp during the
    shutdown of slot sync machinery to help correctly interpret the
    time if the standby gets promoted without a restart.
Tested it and it does that.
3 ===
+/*
+ * Reset the synced slots info such as inactive_since after shutting
+ * down the slot sync machinery.
+ */
+static void
+update_synced_slots_inactive_time(void)
Looks like the comment "reset" is not matching the name of the function and
what it does.
4 ===
+                       /*
+                        * We get the current time beforehand and only once to avoid
+                        * system calls overhead while holding the lock.
+                        */
+                       if (now == 0)
+                               now = GetCurrentTimestamp();
Also +1 of having GetCurrentTimestamp() just called one time within the loop.
5 ===
-               if (!(RecoveryInProgress() && slot->data.synced))
+               if (!(InRecovery && slot->data.synced))
                        slot->inactive_since = GetCurrentTimestamp();
                else
                        slot->inactive_since = 0;
Not related to this change but more the way RestoreSlotFromDisk() behaves here:
For a sync slot on standby it will be set to zero and then later will be
synchronized with the one coming from the primary. I think that's fine to have
it to zero for this window of time.
Now, if the standby is down and one sets sync_replication_slots to off,
then inactive_since will be set to zero on the standby at startup and not 
synchronized (unless one triggers a manual sync). I also think that's fine but
it might be worth to document this behavior (that after a standby startup
inactive_since is zero until the next sync...). 
6 ===
+ print "HI $slot_name $name $inactive_since $slot_creation_time\n";
garbage?
7 ===
+# Capture and validate inactive_since of a given slot.
+sub capture_and_validate_slot_inactive_since
+{
+       my ($node, $slot_name, $slot_creation_time) = @_;
+       my $name = $node->name;
We know have capture_and_validate_slot_inactive_since at 2 places:
040_standby_failover_slots_sync.pl and 019_replslot_limit.pl.
Worth to create a sub in Cluster.pm?
Regards,
-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Peter Geoghegan | 2024-03-26 18:01:58 | Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan | 
| Previous Message | Andrey M. Borodin | 2024-03-26 17:26:14 | Re: UUID v7 |