Re: pgsql: Track last_inactive_time in pg_replication_slots.

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Amit Kapila <akapila(at)postgresql(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgsql: Track last_inactive_time in pg_replication_slots.
Date: 2024-03-27 06:58:06
Message-ID: CALj2ACVFV=yUa3DXXfJLOtJxUM8qzC_mEECMJ2iekDGPeQLkTw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Wed, Mar 27, 2024 at 10:10 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Mar 26, 2024 at 9:10 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> >
> > On 2024-Mar-26, Nathan Bossart wrote:
> >
> > > FWIW I'd really prefer to have something like max_slot_xid_age for this. A
> > > time-based parameter would likely help with most cases, but transaction ID
> > > usage will vary widely from server to server, so it'd be nice to have
> > > something to protect against wraparound more directly.
> >
> > Yeah, I tend to agree that an XID-based limit makes more sense than a
> > time-based one.
> >
> So, do we want the time-based parameter or just max_slot_xid_age
> considering both will be GUC's? Yes, the xid_age based parameter
> sounds to be directly helpful for transaction ID wraparound or dead
> row cleanup, OTOH having a lot of inactive slots (which is possible in
> use cases where a tool creates a lot of slots and forgets to remove
> them, or the tool exits without cleaning up slots (say due to server
> shutdown)) also prohibit removing dead space which is not nice either?

I've personally seen the leftover slots problem on production systems
where a timeout based invalidation mechanism could have been of more
help to save time and reduce manual intervention. Usually, most if not
all, migration/upgrade/other tools create slots, and if at all any
errors occur or the operation gets cancelled midway, there's a chance
that the slots can be leftover if such tools forgets to clean them up
either because there was a bug or for whatever reasons. These are
unintended/ghost slots for the database user unnecessarily holding up
resources such as XIDs, dead rows and WAL; which might lead to XID
wraparound or server crash if unnoticed. Although XID based GUC helps
a lot, why do these unintended and unnoticed slots need to hold up the
resources even before the XID age of say 1.5 or 2 billion is reached.

With both GUCs max_slot_xid_age and replication_slot_inactive_timeout
in place, I can set max_slot_xid_age = 1.5 billion or so and also set
replication_slot_inactive_timeout = 2 days or so to make the database
foolproof.

> The one example that comes to mind is the pg_createsubscriber
> (committed for PG17) which creates one slot per database to convert
> standby to subscriber, now say it exits due to power shutdown then
> there could be a lot of dangling slots on the primary server. Also,
> say there is some bug in such a tool that doesn't allow proper cleanup
> of slots, the same problem can happen; yeah, this would be a problem
> of the tool but I think there is no harm in giving a way to avoid
> problems at the server end due to such slots.

Right. I can personally connect to this problem of leftover slots
where manual intervention was needed to drop all such slots which is
time-consuming and painful sometimes.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-03-27 07:13:16 Re: pgsql: Track last_inactive_time in pg_replication_slots.
Previous Message Amit Kapila 2024-03-27 04:40:05 Re: pgsql: Track last_inactive_time in pg_replication_slots.

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2024-03-27 06:59:58 RE: pg_upgrade and logical replication
Previous Message Richard Guo 2024-03-27 06:52:58 Re: Propagate pathkeys from CTEs up to the outer query