From: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
---|---|
To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date: | 2024-02-05 09:45:50 |
Message-ID: | ZcCuTqik6hT+HQKS@ip-10-97-1-34.eu-west-3.compute.internal |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Thu, Jan 11, 2024 at 10:48:13AM +0530, Bharath Rupireddy wrote:
> Hi,
>
> Therefore, it is often easy for developers to do the following:
> a) set an XID age (age of slot's xmin or catalog_xmin) of say 1 or 1.5
> billion, after which the slots get invalidated.
> b) set a timeout of say 1 or 2 or 3 days, after which the inactive
> slots get invalidated.
>
> To implement (a), postgres needs a new GUC called max_slot_xid_age.
> The checkpointer then invalidates all the slots whose xmin (the oldest
> transaction that this slot needs the database to retain) or
> catalog_xmin (the oldest transaction affecting the system catalogs
> that this slot needs the database to retain) has reached the age
> specified by this setting.
>
> To implement (b), first postgres needs to track the replication slot
> metrics like the time at which the slot became inactive (inactive_at
> timestamptz) and the total number of times the slot became inactive in
> its lifetime (inactive_count numeric) in ReplicationSlotPersistentData
> structure. And, then it needs a new timeout GUC called
> inactive_replication_slot_timeout. Whenever a slot becomes inactive,
> the current timestamp and inactive count are stored in
> ReplicationSlotPersistentData structure and persisted to disk. The
> checkpointer then invalidates all the slots that are lying inactive
> for about inactive_replication_slot_timeout duration starting from
> inactive_at.
>
> In addition to implementing (b), these two new metrics enable
> developers to improve their monitoring tools as the metrics are
> exposed via pg_replication_slots system view. For instance, one can
> build a monitoring tool that signals when replication slots are lying
> inactive for a day or so using inactive_at metric, and/or when a
> replication slot is becoming inactive too frequently using inactive_at
> metric.
Thanks for the patch and +1 for the idea, I think adding those new
"invalidation reasons" make sense.
>
> I’m attaching the v1 patch set as described below:
> 0001 - Tracks invalidation_reason in pg_replication_slots. This is
> needed because slots now have multiple reasons for slot invalidation.
> 0002 - Tracks inactive replication slot information inactive_at and
> inactive_timeout.
> 0003 - Adds inactive_timeout based replication slot invalidation.
> 0004 - Adds XID based replication slot invalidation.
>
I think it's better to have the XID one being discussed/implemented before the
inactive_timeout one: what about changing the 0002, 0003 and 0004 ordering?
0004 -> 0002
0002 -> 0003
0003 -> 0004
As far 0001:
"
This commit renames conflict_reason to
invalidation_reason, and adds the support to show invalidation
reasons for both physical and logical slots.
"
I'm not sure I like the fact that "invalidations" and "conflicts" are merged
into a single field. I'd vote to keep conflict_reason as it is and add a new
invalidation_reason (and put "conflict" as value when it is the case). The reason
is that I think they are 2 different concepts (could be linked though) and that
it would be easier to check for conflicts (means conflict_reason is not NULL).
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2024-02-05 09:50:56 | Re: cataloguing NOT NULL constraints |
Previous Message | Sutou Kouhei | 2024-02-05 09:05:15 | Re: Make COPY format extendable: Extract COPY TO format implementations |