Re: Introduce XID age and inactive timeout based replication slot invalidation

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2024-09-16 11:24:40
Message-ID: CAA4eK1JyGE=zi58zpt_xq_vkobpu8n1TOVKVJ6XJ+0+gisvU9g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 16, 2024 at 3:31 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> Please find the attached v46 patch having changes for the above review
> comments and your test review comments and Shveta's review comments.
>

-ReplicationSlotAcquire(const char *name, bool nowait)
+ReplicationSlotAcquire(const char *name, bool nowait, bool error_if_invalid)
{
ReplicationSlot *s;
int active_pid;
@@ -615,6 +620,22 @@ retry:
/* We made this slot active, so it's ours now. */
MyReplicationSlot = s;

+ /*
+ * An error is raised if error_if_invalid is true and the slot has been
+ * previously invalidated due to inactive timeout.
+ */
+ if (error_if_invalid &&
+ s->data.invalidated == RS_INVAL_INACTIVE_TIMEOUT)
+ {
+ Assert(s->inactive_since > 0);
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("can no longer get changes from replication slot \"%s\"",
+ NameStr(s->data.name)),
+ errdetail("This slot has been invalidated because it was inactive
for longer than the amount of time specified by \"%s\".",
+ "replication_slot_inactive_timeout")));
+ }

Why raise the ERROR just for timeout invalidation here and why not if
the slot is invalidated for other reasons? This raises the question of
what happens before this patch if the invalid slot is used from places
where we call ReplicationSlotAcquire(). I did a brief code analysis
and found that for StartLogicalReplication(), even if the error won't
occur in ReplicationSlotAcquire(), it would have been caught in
CreateDecodingContext(). I think that is where we should also add this
new error. Similarly, pg_logical_slot_get_changes_guts() and other
logical replication functions should be calling
CreateDecodingContext() which can raise the new ERROR. I am not sure
about how the invalid slots are handled during physical replication,
please check the behavior of that before this patch.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-09-16 11:34:04 Re: how to speed up 002_pg_upgrade.pl and 025_stream_regress.pl under valgrind
Previous Message Ashutosh Bapat 2024-09-16 11:11:17 Re: Add memory/disk usage for WindowAgg nodes in EXPLAIN