Re: Synchronizing slots from primary to standby

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2024-02-26 13:13:56
Message-ID: ZdyOlIIm3yiw5+GA@ip-10-97-1-34.eu-west-3.compute.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Mon, Feb 26, 2024 at 05:18:25PM +0530, Amit Kapila wrote:
> On Mon, Feb 26, 2024 at 12:59 PM Bertrand Drouvot
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> > > > 10 ===
> > > >
> > > > + if (slot->data.invalidated != RS_INVAL_NONE)
> > > > + {
> > > > + /*
> > > > + * Specified physical slot have been invalidated,
> > > > so no point
> > > > + * in waiting for it.
> > > >
> > > > We discovered in [1], that if the wal_status is "unreserved" then the slot is still
> > > > serving the standby. I think we should handle this case differently, thoughts?
> > >
> > > I think the 'invalidated' slot can still be used is a separate bug.
> > > Because
> > > once the slot is invalidated, it can neither protect WALs or ROWs from being
> > > removed even if the restart_lsn of the slot can be moved forward after being invalidated.
> > >
> > > If the standby can move restart_lsn forward for invalidated slots, then
> > > it should also set the 'invalidated' flag back to NONE, otherwise the slot
> > > cannot serve its purpose anymore. I also reported similar bug before[1].
> >
> ...
> >
> > My point is that I think we should behave like it's not a bug and then adapt the
> > code accordingly here (until the bug gets fixed).
> >
>
> oh, I think this doesn't sound like a good idea to me. We should fix
> that bug independently rather than adding code in new features to
> consider the bug as a valid behavior.

Agree, but it all depends if there is a consensus of the other thread being a
bug or not.

I also think it is but there is this part of the code in pg_get_replication_slots()
that makes me think ones could think it is not.

"
case WALAVAIL_REMOVED:

/*
* If we read the restart_lsn long enough ago, maybe that file
* has been removed by now. However, the walsender could have
* moved forward enough that it jumped to another file after
* we looked. If checkpointer signalled the process to
* termination, then it's definitely lost; but if a process is
* still alive, then "unreserved" seems more appropriate.
*
"

Anyway, I also think it is a bug so agree to keep the check as it is currenlty (
and keep an eye on the other thread outcome too).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2024-02-26 13:22:39 Re: Shared detoast Datum proposal
Previous Message Nazir Bilal Yavuz 2024-02-26 12:42:33 Re: Add missing error codes to PANIC/FATAL error reports in xlog.c and relcache.c