From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Amit Kapila <akapila(at)postgresql(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: pgsql: Track last_inactive_time in pg_replication_slots. |
Date: | 2024-03-27 15:05:57 |
Message-ID: | 20240327150557.GA3994937@nathanxps13 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers pgsql-hackers |
On Wed, Mar 27, 2024 at 10:33:28AM -0400, Robert Haas wrote:
> FWIW, I thought the time-based one sounded more useful. I think it
> would be poor planning to say "well, if the slot reaches an XID age of
> a billion, kill it so we don't wrap around," because while that likely
> will prevent me from getting into wraparound trouble, my database is
> likely to become horribly bloated long before the cutoff is reached. I
> thought it would be easier to reason in terms of time: I don't expect
> a slave to ever be down for more than X period of time, say an hour or
> whatever, so if it is, forget about it. Or alternatively, I know that
> if a slave does go down for more than X period of time, I start to get
> bloat, so cut it off at that point and I'll rebuild it later. I feel
> like these are things where people's intuition is going to be much
> stronger when reckoning in units of wall-clock time, which everyone
> deals with every day in one way or another, rather than in XID-based
> units that are, at least in my view, just a lot less intuitive.
I don't disagree with this point in the context of a user who is managing a
single server or just a handful of servers. They are going to understand
their workload best and can reason about the right value for the timeout.
I think they'd still benefit from having an XID-based setting as a backstop
in case the timeout is still not sufficient to prevent wraparound, but it's
not nearly as important in that case.
IMHO the use-case where this doesn't work so well is when you have many,
many servers to administer (e.g., a cloud provider). In those cases,
picking a default timeout to try to prevent wraparound is going to be much
less accurate, as any reasonable value you pick is still going to be
insufficient in some cases. I think the XID-based parameter would be
better here; if the server is at imminent risk of an outage due to
wraparound, invalidating the slots is probably a reasonable course of
action.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2024-03-27 15:24:16 | pgsql: Adjust documentation for syncfs(). |
Previous Message | Robert Haas | 2024-03-27 15:05:06 | pgsql: Rename COMPAT_OPTIONS_CLIENT to COMPAT_OPTIONS_OTHER. |
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2024-03-27 15:08:26 | Re: pg_upgrade failing for 200+ million Large Objects |
Previous Message | Robert Haas | 2024-03-27 15:05:55 | Re: Possibility to disable `ALTER SYSTEM` |