From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date: | 2024-09-16 17:10:52 |
Message-ID: | CALj2ACXYtySVT0iMkpLqLj8NX=PjzTqHqMxDyM9Qoat+mwzw4Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
Thanks for looking into this.
On Mon, Sep 16, 2024 at 4:54 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> Why raise the ERROR just for timeout invalidation here and why not if
> the slot is invalidated for other reasons? This raises the question of
> what happens before this patch if the invalid slot is used from places
> where we call ReplicationSlotAcquire(). I did a brief code analysis
> and found that for StartLogicalReplication(), even if the error won't
> occur in ReplicationSlotAcquire(), it would have been caught in
> CreateDecodingContext(). I think that is where we should also add this
> new error. Similarly, pg_logical_slot_get_changes_guts() and other
> logical replication functions should be calling
> CreateDecodingContext() which can raise the new ERROR. I am not sure
> about how the invalid slots are handled during physical replication,
> please check the behavior of that before this patch.
When physical slots are invalidated due to wal_removed reason, the failure
happens at a much later point for the streaming standbys while reading the
requested WAL files like the following:
2024-09-16 16:29:52.416 UTC [876059] FATAL: could not receive data from
WAL stream: ERROR: requested WAL segment 000000010000000000000005 has
already been removed
2024-09-16 16:29:52.416 UTC [872418] LOG: waiting for WAL to become
available at 0/5002000
At this point, despite the slot being invalidated, its wal_status can still
come back to 'unreserved' even from 'lost', and the standby can catch up if
removed WAL files are copied either by manually or by a tool/script to the
primary's pg_wal directory. IOW, the physical slots invalidated due to
wal_removed are *somehow* recoverable unlike the logical slots.
IIUC, the invalidation of a slot implies that it is not guaranteed to hold
any resources like WAL and XMINs. Does it also imply that the slot must be
unusable?
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2024-09-16 17:12:50 | Re: Using per-transaction memory contexts for storing decoded tuples |
Previous Message | Noah Misch | 2024-09-16 16:24:30 | Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation |