From: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
---|---|
To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
Cc: | shveta malik <shveta(dot)malik(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date: | 2024-04-03 16:27:45 |
Message-ID: | Zg2DgfXorAXrC8lw@ip-10-97-1-34.eu-west-3.compute.internal |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Wed, Apr 03, 2024 at 08:28:04PM +0530, Bharath Rupireddy wrote:
> On Wed, Apr 3, 2024 at 6:46 PM Bertrand Drouvot
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >
> > Just one comment on v32-0001:
> >
> > +# Synced slot on the standby must get its own inactive_since.
> > +is( $standby1->safe_psql(
> > + 'postgres',
> > + "SELECT '$inactive_since_on_primary'::timestamptz <= '$inactive_since_on_standby'::timestamptz AND
> > + '$inactive_since_on_standby'::timestamptz <= '$slot_sync_time'::timestamptz;"
> > + ),
> > + "t",
> > + 'synchronized slot has got its own inactive_since');
> > +
> >
> > By using <= we are not testing that it must get its own inactive_since (as we
> > allow them to be equal in the test). I think we should just add some usleep()
> > where appropriate and deny equality during the tests on inactive_since.
>
> > Except for the above, v32-0001 LGTM.
>
> Thanks. Please see the attached v33-0001 patch after removing equality
> on inactive_since TAP tests.
Thanks! v33-0001 LGTM.
> On Wed, Apr 3, 2024 at 1:47 PM Bertrand Drouvot
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> > Some comments regarding v31-0002:
> >
> > T2 ===
> >
> > In case the slot is invalidated on the primary,
> >
> > primary:
> >
> > postgres=# select slot_name, inactive_since, invalidation_reason from pg_replication_slots where slot_name = 's1';
> > slot_name | inactive_since | invalidation_reason
> > -----------+-------------------------------+---------------------
> > s1 | 2024-04-03 06:56:28.075637+00 | inactive_timeout
> >
> > then on the standby we get:
> >
> > standby:
> >
> > postgres=# select slot_name, inactive_since, invalidation_reason from pg_replication_slots where slot_name = 's1';
> > slot_name | inactive_since | invalidation_reason
> > -----------+------------------------------+---------------------
> > s1 | 2024-04-03 07:06:43.37486+00 | inactive_timeout
> >
> > shouldn't the slot be dropped/recreated instead of updating inactive_since?
>
> The sync slots that are invalidated on the primary aren't dropped and
> recreated on the standby.
Yeah, right (I was confused with synced slots that are invalidated locally).
> However, I
> found that the synced slot is acquired and released unnecessarily
> after the invalidation_reason is synced from the primary. I added a
> skip check in synchronize_one_slot to skip acquiring and releasing the
> slot if it's locally found inactive. With this, inactive_since won't
> get updated for invalidated sync slots on the standby as we don't
> acquire and release the slot.
CR1 ===
Yeah, I can see:
@@ -575,6 +575,13 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
" name slot \"%s\" already exists on the standby",
remote_slot->name));
+ /*
+ * Skip the sync if the local slot is already invalidated. We do this
+ * beforehand to save on slot acquire and release.
+ */
+ if (slot->data.invalidated != RS_INVAL_NONE)
+ return false;
Thanks to the drop_local_obsolete_slots() call I think we are not missing the case
where the slot has been invalidated on the primary, invalidation reason has been
synced on the standby and later the slot is dropped/ recreated manually on the
primary (then it should be dropped/recreated on the standby too).
Also it seems we are not missing the case where a sync slot is invalidated
locally due to wal removal (it should be dropped/recreated).
>
> > CR5 ===
> >
> > + /*
> > + * This function isn't expected to be called for inactive timeout based
> > + * invalidation. A separate function InvalidateInactiveReplicationSlot is
> > + * to be used for that.
> >
> > Do you think it's worth to explain why?
>
> Hm, I just wanted to point out the actual function here. I modified it
> to something like the following, if others feel we don't need that, I
> can remove it.
Sorry If I was not clear but I meant to say "Do you think it's worth to explain
why we decided to create a dedicated function"? (currently we "just" explain that
we created one).
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Andrey M. Borodin | 2024-04-03 16:31:29 | Re: [PATCH] Modify pg_ctl to detect presence of geek user |
Previous Message | Alvaro Herrera | 2024-04-03 16:01:28 | Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock |