Re: Introduce XID age and inactive timeout based replication slot invalidation

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2024-09-04 09:18:51
Message-ID: CAJpy0uD21cPp9FkSCJi+NO47Y-6wmE+UW6i_K0HJCn=2UdMDmw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 4, 2024 at 9:17 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Tue, Sep 3, 2024 at 3:01 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> >

1)
It is related to one of my previous comments (pt 3 in [1]) where I
stated that inactive_since should not keep on changing once a slot is
invalidated.
Below is one side effect if inactive_since keeps on changing:

postgres=# SELECT * FROM pg_replication_slot_advance('mysubnew1_1',
pg_current_wal_lsn());
ERROR: can no longer get changes from replication slot "mysubnew1_1"
DETAIL: The slot became invalid because it was inactive since
2024-09-04 10:03:56.68053+05:30, which is more than 10 seconds ago.
HINT: You might need to increase "replication_slot_inactive_timeout.".

postgres=# select now();
now
---------------------------------
2024-09-04 10:04:00.26564+05:30

'DETAIL' gives wrong information, we are not past 10-seconds. This is
because inactive_since got updated even in ERROR scenario.

2)
One more issue in this message is, once I set
replication_slot_inactive_timeout to a bigger value, it becomes more
misleading. This is because invalidation was done in the past using
previous value while message starts showing new value:

ALTER SYSTEM SET replication_slot_inactive_timeout TO '36h';

--see 129600 secs in DETAIL and the current time.
postgres=# SELECT * FROM pg_replication_slot_advance('mysubnew1_1',
pg_current_wal_lsn());
ERROR: can no longer get changes from replication slot "mysubnew1_1"
DETAIL: The slot became invalid because it was inactive since
2024-09-04 10:06:38.980939+05:30, which is more than 129600 seconds
ago.
postgres=# select now();
now
----------------------------------
2024-09-04 10:07:35.201894+05:30

I feel we should change this message itself.

~~~~~

When invalidation is due to wal_removed, we get a way simpler message:

newdb1=# SELECT * FROM pg_replication_slot_advance('mysubnew1_2',
pg_current_wal_lsn());
ERROR: replication slot "mysubnew1_2" cannot be advanced
DETAIL: This slot has never previously reserved WAL, or it has been
invalidated.

This message does not mention 'max_slot_wal_keep_size'. We should have
a similar message for our case. Thoughts?

[1]: https://www.postgresql.org/message-id/CAJpy0uC8Dg-0JS3NRUwVUemgz5Ar2v3_EQQFXyAigWSEQ8U47Q%40mail.gmail.com

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-09-04 09:28:24 Re: [PoC] Federated Authn/z with OAUTHBEARER
Previous Message jian he 2024-09-04 08:57:00 Re: Add memory/disk usage for WindowAgg nodes in EXPLAIN