From: | Peter Smith <smithpb2250(at)gmail(dot)com> |
---|---|
To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date: | 2024-09-17 01:27:24 |
Message-ID: | CAHut+Ps=x+2Hq5ue0YppOeDZqgHTnyw=u+vs-qy0JRjKaeJtew@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Here are a few comments for the patch v46-0001.
======
src/backend/replication/slot.c
1. ReportSlotInvalidation
On Mon, Sep 16, 2024 at 8:01 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Mon, Sep 9, 2024 at 1:11 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > 3. ReportSlotInvalidation
> >
> > I didn't understand why there was a hint for:
> > "You might need to increase \"%s\".", "max_slot_wal_keep_size"
> >
> > Why aren't these similar cases consistent?
>
> It looks misleading and not very useful. What happens if the removed
> WAL (that's needed for the slot) is put back into pg_wal somehow (by
> manually copying from archive or by some tool/script)? Can the slot
> invalidated due to wal_removed start sending WAL to its clients?
>
> > But you don't have an equivalent hint for timeout invalidation:
> > "You might need to increase \"%s\".", "replication_slot_inactive_timeout"
>
> I removed this per review comments upthread.
IIUC the errors are quite similar, so my previous review comment was
mostly about the unexpected inconsistency of why one of them has a
hint and the other one does not. I don't have a strong opinion about
whether they should both *have* or *not have* hints, so long as they
are treated the same.
If you think the current code hint is not useful then maybe we need a
new thread to address that existing issue. For example, maybe it
should be removed or reworded.
~~~
2. InvalidatePossiblyObsoleteSlot:
+ case RS_INVAL_INACTIVE_TIMEOUT:
+
+ if (!SlotInactiveTimeoutCheckAllowed(s))
+ break;
+
+ /*
+ * Check if the slot needs to be invalidated due to
+ * replication_slot_inactive_timeout GUC.
+ */
+ if (TimestampDifferenceExceeds(s->inactive_since, now,
+ replication_slot_inactive_timeout * 1000))
nit - it might be tidier to avoid multiple breaks by just combining
these conditions. See the nitpick attachment.
~~~
3.
* - RS_INVAL_WAL_LEVEL: is logical
+ * - RS_INVAL_INACTIVE_TIMEOUT: inactive timeout occurs
nit - use comment wording "inactive slot timeout has occurred", to
make it identical to the comment in slot.h
======
src/test/recovery/t/050_invalidate_slots.pl
4.
+# Despite inactive timeout being set, the synced slot won't get invalidated on
+# its own on the standby. So, we must not see invalidation message in server
+# log.
+$standby1->safe_psql('postgres', "CHECKPOINT");
+ok( !$standby1->log_contains(
+ "invalidating obsolete replication slot \"sync_slot1\"",
+ $logstart),
+ 'check that synced slot sync_slot1 has not been invalidated on standby'
+);
+
It seems kind of brittle to check the logs for something that is NOT
there because any change to the message will make this accidentally
pass. Apart from that, it might anyway be more efficient just to check
the pg_replication_slots again to make sure the 'invalidation_reason
remains' still NULL.
======
Please see the attachment which implements some of the nit changes
mentioned above.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachment | Content-Type | Size |
---|---|---|
PS_NITPICKS_20240917_SLOT_TIMEOUT_v46.txt | text/plain | 1.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tatsuo Ishii | 2024-09-17 02:40:04 | Re: Add memory/disk usage for WindowAgg nodes in EXPLAIN |
Previous Message | Thomas Munro | 2024-09-17 01:01:17 | Re: Robocopy might be not robust enough for never-ending testing on Windows |