From: | Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Restrict copying of invalidated replication slots |
Date: | 2025-02-24 11:06:28 |
Message-ID: | CANhcyEUHp6cRfaKf0ZqHCppCqpqzmsf5swpbnYGyRU+N+ihi=Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, 22 Feb 2025 at 04:49, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Fri, Feb 21, 2025 at 4:30 AM Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
> >
> > On Fri, 21 Feb 2025 at 01:14, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Feb 19, 2025 at 3:46 AM Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
> > > >
> > > > On Tue, 18 Feb 2025 at 15:26, Zhijie Hou (Fujitsu)
> > > > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > > > >
> > > > > On Monday, February 17, 2025 7:31 PM Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
> > > > > >
> > > > > > On Thu, 13 Feb 2025 at 15:54, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > > > > > >
> > > > > > > On Tue, 4 Feb 2025 at 15:27, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
> > > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Currently, we can copy an invalidated slot using the function
> > > > > > > > 'pg_copy_logical_replication_slot'. As per the suggestion in the
> > > > > > > > thread [1], we should prohibit copying of such slots.
> > > > > > > >
> > > > > > > > I have created a patch to address the issue.
> > > > > > >
> > > > > > > This patch does not fix all the copy_replication_slot scenarios
> > > > > > > completely, there is a very corner concurrency case where an
> > > > > > > invalidated slot still gets copied:
> > > > > > > + /* We should not copy invalidated replication slots */
> > > > > > > + if (src_isinvalidated)
> > > > > > > + ereport(ERROR,
> > > > > > > +
> > > > > > > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > > > > > > + errmsg("cannot copy an invalidated
> > > > > > > replication slot")));
> > > > > > >
> > > > > > > Consider the following scenario:
> > > > > > > step 1) Set up streaming replication between the primary and standby nodes.
> > > > > > > step 2) Create a logical replication slot (test1) on the standby node.
> > > > > > > step 3) Have a breakpoint in InvalidatePossiblyObsoleteSlot if cause
> > > > > > > is RS_INVAL_WAL_LEVEL, no need to hold other invalidation causes or
> > > > > > > add a sleep in InvalidatePossiblyObsoleteSlot function like below:
> > > > > > > if (cause == RS_INVAL_WAL_LEVEL)
> > > > > > > {
> > > > > > > while (bsleep)
> > > > > > > sleep(1);
> > > > > > > }
> > > > > > > step 4) Reduce wal_level on the primary to replica and restart the primary
> > > > > > node.
> > > > > > > step 5) SELECT 'copy' FROM pg_copy_logical_replication_slot('test1',
> > > > > > > 'test2'); -- It will wait till the lock held by
> > > > > > > InvalidatePossiblyObsoleteSlot is released while trying to create a
> > > > > > > slot.
> > > > > > > step 6) Increase wal_level back to logical on the primary node and
> > > > > > > restart the primary.
> > > > > > > step 7) Now allow the invalidation to happen (continue the breakpoint
> > > > > > > held at step 3), the replication control lock will be released and the
> > > > > > > invalidated slot will be copied
> > > > > > >
> > > > > > > After this:
> > > > > > > postgres=# SELECT 'copy' FROM
> > > > > > > pg_copy_logical_replication_slot('test1', 'test2'); ?column?
> > > > > > > ----------
> > > > > > > copy
> > > > > > > (1 row)
> > > > > > >
> > > > > > > -- The invalidated slot (test1) is copied successfully:
> > > > > > > postgres=# select * from pg_replication_slots ;
> > > > > > > slot_name | plugin | slot_type | datoid | database | temporary
> > > > > > > | active | active_pid | xmin | catalog_xmin | restart_lsn |
> > > > > > > confirmed_flush_lsn | wal_status | safe_wal_size | two_phas
> > > > > > > e | inactive_since | conflicting |
> > > > > > > invalidation_reason | failover | synced
> > > > > > >
> > > > > > -----------+---------------+-----------+--------+----------+-----------+
> > > > > > --------+------------+------+--------------+-------------+---------------
> > > > > > ------+------------+---------------+---------
> > > > > > >
> > > > > > --+----------------------------------+-------------+----------------------
> > > > > > --+----------+--------
> > > > > > > test1 | test_decoding | logical | 5 | postgres | f
> > > > > > > | f | | | 745 | 0/4029060 | 0/4029098
> > > > > > > | lost | | f
> > > > > > > | 2025-02-13 15:26:54.666725+05:30 | t |
> > > > > > > wal_level_insufficient | f | f
> > > > > > > test2 | test_decoding | logical | 5 | postgres | f
> > > > > > > | f | | | 745 | 0/4029060 | 0/4029098
> > > > > > > | reserved | | f
> > > > > > > | 2025-02-13 15:30:30.477836+05:30 | f |
> > > > > > > | f | f
> > > > > > > (2 rows)
> > > > > > >
> > > > > > > -- A subsequent attempt to decode changes from the invalidated slot
> > > > > > > (test2) fails:
> > > > > > > postgres=# SELECT data FROM pg_logical_slot_get_changes('test2', NULL,
> > > > > > > NULL);
> > > > > > > WARNING: detected write past chunk end in TXN 0x5e77e6c6f300
> > > > > > > ERROR: logical decoding on standby requires "wal_level" >= "logical"
> > > > > > > on the primary
> > > > > > >
> > > > > > > -- Alternatively, the following error may occur:
> > > > > > > postgres=# SELECT data FROM pg_logical_slot_get_changes('test2', NULL,
> > > > > > > NULL);
> > > > > > > WARNING: detected write past chunk end in TXN 0x582d1b2d6ef0
> > > > > > > data
> > > > > > > ------------
> > > > > > > BEGIN 744
> > > > > > > COMMIT 744
> > > > > > > (2 rows)
> > > > > > >
> > > > > > > This is an edge case that can occur under specific conditions
> > > > > > > involving replication slot invalidation when there is a huge lag
> > > > > > > between primary and standby.
> > > > > > > There might be a similar concurrency case for wal_removed too.
> > > > > > >
> > > > > >
> > > > > > Hi Vignesh,
> > > > > >
> > > > > > Thanks for reviewing the patch.
> > > > >
> > > > > Thanks for updating the patch. I have a question related to it.
> > > > >
> > > > > >
> > > > > > I have tested the above scenario and was able to reproduce it. I have fixed it in
> > > > > > the v2 patch.
> > > > > > Currently we are taking a shared lock on ReplicationSlotControlLock.
> > > > > > This issue can be resolved if we take an exclusive lock instead.
> > > > > > Thoughts?
> > > > >
> > > > > It's not clear to me why increasing the lock level can solve it, could you
> > > > > elaborate a bit more on this ?
> > > > >
> > > > In HEAD, InvalidateObsoleteReplicationSlots acquires a SHARED lock on
> > > > 'ReplicationSlotControlLock'
> > > > Also in function 'copy_replication_slot' we take a SHARED lock on
> > > > 'ReplicationSlotControlLock' during fetching of source slot.
> > > >
> > > > So, for the case described by Vignesh in [1], first
> > > > InvalidateObsoleteReplicationSlot is called and we hold a SHARED lock
> > > > on 'ReplicationSlotControlLock'. We are now holding the function using
> > > > the sleep
> > > > if (cause == RS_INVAL_WAL_LEVEL)
> > > > {
> > > > while (bsleep)
> > > > sleep(1);
> > > > }
> > > >
> > > > Now we create a copy of the slot since 'copy_replication_slot' takes
> > > > a SHARED lock on 'ReplicationSlotControlLock'. It will take the lock
> > > > and fetch the info of the source slot (the slot is not invalidated
> > > > till now). and the function 'copy_replication_slot' calls function
> > > > 'create_logical_replication_slot' which takes a EXCLUSIVE lock on
> > > > ReplicationSlotControlLock and hence it will wait for function
> > > > InvalidateObsoleteReplicationSlot to release lock. Once the function
> > > > 'InvalidateObsoleteReplicationSlot' releases the lock, the execution
> > > > of 'create_logical_replication_slot' continues and creates a copy of
> > > > the source slot.
> > > >
> > > > Now with the patch, 'copy_replication_slot' will take an EXCLUSIVE
> > > > lock on 'ReplicationSlotControlLock'. to fetch the slot info. Hence,
> > > > it will wait for the 'InvalidateObsoleteReplicationSlot' to release
> > > > the lock and then fetch the source slot info and try to create the
> > > > copied slot (which will fail as source slot is invalidated before we
> > > > fetch its info)
> > > >
> > > > > Besides, do we need one more invalidated check in the following codes after
> > > > > creating the slot ?
> > > > >
> > > > > /*
> > > > > * Check if the source slot still exists and is valid. We regard it as
> > > > > * invalid if the type of replication slot or name has been changed,
> > > > > * or the restart_lsn either is invalid or has gone backward. (The
> > > > > ...
> > > > >
> > > >
> > > > This approach seems more feasible to me. It also resolves the issue
> > > > suggested by Vignesh in [1]. I have made changes for the same in v3
> > > > patch.
> > > >
> > >
> > > I agree to check if the source slot got invalidated during the copy.
> > > But why do we need to search the slot by the slot name again as
> > > follows?
> > >
> > > + /* Check if source slot was invalidated while copying of slot */
> > > + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
> > > +
> > > + for (int i = 0; i < max_replication_slots; i++)
> > > + {
> > > + ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
> > > +
> > > + if (s->in_use && strcmp(NameStr(s->data.name),
> > > NameStr(*src_name)) == 0)
> > > + {
> > > + /* Copy the slot contents while holding spinlock */
> > > + SpinLockAcquire(&s->mutex);
> > > + first_slot_contents = *s;
> > > + SpinLockRelease(&s->mutex);
> > > + src = s;
> > > + break;
> > > + }
> > > + }
> > > +
> > > + LWLockRelease(ReplicationSlotControlLock);
> > >
> > > I think 'src' already points to the source slot.
> > >
> > Hi Sawada san,
> >
> > Thanks for reviewing the patch.
> > I have used the 'src' instead of iterating again. I have attached the
> > updated v4 patch.
>
> Thank you for updating the patch! I have one comment:
>
> + /* Check if source slot was invalidated while copying of slot */
> + SpinLockAcquire(&src->mutex);
> + first_slot_contents = *src;
> + SpinLockRelease(&src->mutex);
>
> We don't need to copy the source slot contents again since we already
> do as follows:
>
> /* Copy data of source slot again */
> SpinLockAcquire(&src->mutex);
> second_slot_contents = *src;
> SpinLockRelease(&src->mutex);
>
> I think we can use second_slot_contents for that check.
>
I agree. I have updated the v5 patch to use second_slot_contents
> I've investigated the slot invalidation and copying slots behaviors.
> We cannot copy a slot if it doesn't reserve WAL, but IIUC the slot's
> restart_lsn is not reset by slot invalidation due to other than
> RS_INVAL_WAL_REMOVED. Therefore, it's possible that we copy a slot
> invalidated by for example RS_INVAL_IDLE_TIMEOUT, and the copied
> slot's restart_lsn might have already been removed, which ultimately
> causes an assertion failure in ocpy_replication_slot():
>
> #ifdef USE_ASSERT_CHECKING
> /* Check that the restart_lsn is available */
> {
> XLogSegNo segno;
>
> XLByteToSeg(copy_restart_lsn, segno, wal_segment_size);
> Assert(XLogGetLastRemovedSegno() < segno);
> }
> #endif
>
> I think this issue exists from v16 or later, I've not tested yet
> though. If my understanding is right, this patch has to be
> backpatched.
>
I have tested the above in HEAD, PG 17 and PG 16 and found that we can
hit the above ASSERT condition in all three branches. With the
following steps:
1. create a physical replication setup
2. In standby create a logical replication slot.
3. change wal_level of primary to 'replica' and restart primary. The
slot is invalidated with 'wal_level_insufficient'
4. change wal_level of primary to 'logical' and restart primary.
5. In primary insert some records and run checkpoint. Also run a
checkpoint on standby. So, some initial wal files are removed.
6. Now copy the logical replication slot created in step 2. Then we
can hit the assert.
I agree that backpatching the patch can resolve this as it prevents
copying of invalidated slots.
I have attached the following patches:
v5-0001 : for HEAD
v5_PG_17_PG_16-0001 : for PG17 and PG16
Thanks and Regards,
Shlok Kyal
Attachment | Content-Type | Size |
---|---|---|
v5-0001-Restrict-copying-of-invalidated-replication-slots.patch | application/octet-stream | 4.8 KB |
v5_PG_17_PG_16-0001-Restrict-copying-of-invalidated-repli.patch | application/octet-stream | 4.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Shlok Kyal | 2025-02-24 11:07:35 | Re: Restrict copying of invalidated replication slots |
Previous Message | Ilia Evdokimov | 2025-02-24 11:03:13 | Re: explain analyze rows=%.0f |