From: | vignesh C <vignesh21(at)gmail(dot)com> |
---|---|
To: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
Cc: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Forbid the use of invalidated physical slots in streaming replication. |
Date: | 2024-01-08 04:55:34 |
Message-ID: | CALDaNm0WqED9nJV4SSQ=qNV4nGN8R3fK3yxoqL9CHpYfh4A+uQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 8 Dec 2023 at 19:15, Ashutosh Bapat
<ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
> > > > pg_replication_slot could be set back to null.
> > >
> > > In this case, since the basebackup was taken after the slot was invalidated, it
> > > does not require the WAL that was removed. But it seems that once the
> > > streaming starts, the slot sprints to life again and gets validated again. Here's
> > > pg_replication_slot output after the standby starts.
> >
> > Actually, It doesn't bring the invalidated slot back to life completely.
> > The slot's view data looks valid while the 'invalidated' flag of this slot is still
> > RS_INVAL_WAL_REMOVED (user are not aware of it.)
> >
>
> I was mislead by the code in pg_get_replication_slots(). I did not
> read it till the following
>
> --- code ----
> case WALAVAIL_REMOVED:
>
> /*
> * If we read the restart_lsn long enough ago, maybe that file
> * has been removed by now. However, the walsender could have
> * moved forward enough that it jumped to another file after
> * we looked. If checkpointer signalled the process to
> * termination, then it's definitely lost; but if a process is
> * still alive, then "unreserved" seems more appropriate.
> *
> * If we do change it, save the state for safe_wal_size below.
> */
> --- code ---
>
> I see now how an invalid slot's wal status can be reported as
> unreserved. So I think it's a bug.
>
> >
> > >
> > > >
> > > > So, I think it's a bug and one idea to fix is to check the validity of
> > > > the physical slot in
> > > > StartReplication() after acquiring the slot like what the attachment
> > > > does, what do you think ?
> > >
> > > I am not sure whether that's necessarily a bug. Of course, we don't expect
> > > invalid slots to be used but in this case I don't see any harm.
> > > The invalid slot has been revived and has all the properties set just like a valid
> > > slot. So it must be considered in
> > > ReplicationSlotsComputeRequiredXmin() as well. I haven't verified it myself
> > > though. In case the WAL is really lost and is requested by the standby it will
> > > throw an error "requested WAL segment [0-9A-F]+ has already been
> > > removed". So no harm there as well.
> >
> > Since the 'invalidated' field of the slot is still (RS_INVAL_WAL_REMOVED), even
> > if the walsender advances the restart_lsn, the slot will not be considered in the
> > ReplicationSlotsComputeRequiredXmin(), so the WALs and Rows are not safe
> > and that's why I think it's a bug.
> >
> > After looking closer, it seems this behavior started from 15f8203 which introduced the
> > ReplicationSlotInvalidationCause 'invalidated', after that we check the invalidated enum
> > in Xmin/Lsn computation function.
> >
> > If we want to go back to previous behavior, we need to revert/adjust the check
> > for invalidated in ReplicationSlotsComputeRequiredXmin(), but since the logical
> > decoding(walsender) disallow using invalidated slot, so I feel it's consistent
> > to do similar check for physical one. Besides, pg_replication_slot_advance()
> > also disallow passing invalidated slot to it as well. What do you think ?
>
> What happens if you run your script on a build prior to 15f8203?
> Purely from reading the code, it looks like the physical slot would
> sprint back to life since its restart_lsn would be updated. But I am
> not able to see what happens to invalidated_at. It probably remains a
> valid LSN and the slot would still not be considred in xmin
> calculation. It will be good to be compatible to pre-15f8203
> behaviour.
I have changed the patch status to "Waiting on Author", as some of the
queries requested by Ashutosh are not yet addressed.
Regards,
Vignesh
From | Date | Subject | |
---|---|---|---|
Next Message | vignesh C | 2024-01-08 05:08:00 | Re: proposal: psql: show current user in prompt |
Previous Message | Amul Sul | 2024-01-08 03:38:31 | Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression |