From: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
---|---|
To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
Cc: | Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com> |
Subject: | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date: | 2024-09-03 09:31:06 |
Message-ID: | CAJpy0uC8Dg-0JS3NRUwVUemgz5Ar2v3_EQQFXyAigWSEQ8U47Q@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Aug 31, 2024 at 1:45 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> Hi,
>
>
> Please find the attached v44 patch with the above changes. I will
> include the 0002 xid_age based invalidation patch later.
>
Thanks for the patch Bharath. My review and testing is WIP, but please
find few comments and queries:
1)
I see that ReplicationSlotAlter() will error out if the slot is
invalidated due to timeout. I have not tested it myself, but do you
know if slot-alter errors out for other invalidation causes as well?
Just wanted to confirm that the behaviour is consistent for all
invalidation causes.
2)
When a slot is invalidated, and we try to use that slot, it gives this msg:
ERROR: can no longer get changes from replication slot "mysubnew1_2"
DETAIL: The slot became invalid because it was inactive since
2024-09-03 14:23:34.094067+05:30, which is more than 600 seconds ago.
HINT: You might need to increase "replication_slot_inactive_timeout.".
Isn't HINT misleading? Even if we increase it now, the slot can not be
reused again.
3)
When the slot is invalidated, the' inactive_since' still keeps on
changing when there is a subscriber trying to start replication
continuously. I think ReplicationSlotAcquire() keeps on failing and
thus Release keeps on setting it again and again. Shouldn't we stop
setting/chnaging 'inactive_since' once the slot is invalidated
already, otherwise it will be misleading.
postgres=# select failover,synced,inactive_since,invalidation_reason
from pg_replication_slots;
failover | synced | inactive_since | invalidation_reason
----------+--------+----------------------------------+---------------------
t | f | 2024-09-03 14:23:.. | inactive_timeout
after sometime:
failover | synced | inactive_since | invalidation_reason
----------+--------+----------------------------------+---------------------
t | f | 2024-09-03 14:26:..| inactive_timeout
4)
src/sgml/config.sgml:
4a)
+ A value of zero (which is default) disables the timeout mechanism.
Better will be:
A value of zero (which is default) disables the inactive timeout
invalidation mechanism .
or
A value of zero (which is default) disables the slot invalidation due
to the inactive timeout mechanism.
i.e. rephrase to indicate that invalidation is disabled.
4b)
'synced' and inactive_since should point to pg_replication_slots:
example:
<link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>synced</structfield>
5)
src/sgml/system-views.sgml:
+ ..the slot has been inactive for longer than the duration specified
by replication_slot_inactive_timeout parameter.
Better to have:
..the slot has been inactive for a time longer than the duration
specified by the replication_slot_inactive_timeout parameter.
thanks
Shveta
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2024-09-03 09:41:20 | Re: altering a column's collation leaves an invalid foreign key |
Previous Message | Jehan-Guillaume de Rorthais | 2024-09-03 09:26:37 | Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails |