Improve error handling for invalid slots and ensure a same 'inactive_since' time for inactive slots

From: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject: Improve error handling for invalid slots and ensure a same 'inactive_since' time for inactive slots
Date: 2025-01-28 11:35:05
Message-ID: CABdArM6pBL5hPnSQ+5nEVMANcF4FCH7LQmgskXyiLY75TMnKpw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Hackers,
(CC people involved in the earlier discussion)

While implementing slot invalidation based on inactive(idle) timeout
(see [1]), several general optimizations and improvements were
identified.

This thread is a spin-off from [1], intended to address these
optimizations separately from the main feature. As suggested in [2],
the improvements are divided into two parts:

Patch-001: Update the logic to ensure all inactive slots have the same
'inactive_since' time when restoring the slots from disk in
RestoreSlotFromDisk() and when updating the synced slots on standby in
update_synced_slots_inactive_since().

Patch-002: Raise error for invalid slots while acquiring it in
ReplicationSlotAcquire().
Currently, a process can acquire an invalid slot but may eventually
error out at later stages. For example, if a process acquires a slot
invalidated due to wal_removed, it will later fail in
CreateDecodingContext() when trying to access the removed WAL. The
idea here is to improve error handling by detecting invalid slots
earlier.
A new parameter, "error_if_invalid", is introduced in
ReplicationSlotAcquire(). If the caller specifies
error_if_invalid=true, an error is raised immediately instead of
letting the process acquire the invalid slot first and then fail later
due to the invalidated slot.

The v1 patches are attached, any feedback would be appreciated!

[1] https://www.postgresql.org/message-id/flat/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw(at)mail(dot)gmail(dot)com
[2] https://www.postgresql.org/message-id/CAA4eK1LiDjz%2BF8hEYG0_ux%3DrqwhxnTuWpT-RKNDWaac3w3bWNw%40mail.gmail.com

--
Thanks,
Nisha

Attachment Content-Type Size
v1-0001-Ensure-same-inactive_since-time-for-all-inactive-.patch application/octet-stream 2.9 KB
v1-0002-Raise-Error-for-Invalid-Slots-in-ReplicationSlotA.patch application/octet-stream 9.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nisha Moond 2025-01-28 11:56:51 Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message Junwang Zhao 2025-01-28 11:16:30 Re: SQL Property Graph Queries (SQL/PGQ)