From: | px shi <spxlyy123(at)gmail(dot)com> |
---|---|
To: | Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [Bug Fix]standby may crash when switching-over in certain special cases |
Date: | 2024-09-30 07:14:54 |
Message-ID: | CAAccyYKXRVSmfC-YYdPbgsZfPiK_Tk4RLggxWs8UETxfKD7kRA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks for responding.
> It is odd that the standby server crashes when
replication fails because the standby would keep retrying to get the
next record even in such case.
As I mentioned earlier, when replication fails, it retries to establish
streaming replication. At this point, the value of *walrcv->flushedUpto *is
not necessarily the data actually flushed to disk. However, the startup
process mistakenly believes that the latest flushed LSN is
*walrcv->flushedUpto* and attempts to open the corresponding WAL file,
which doesn't exist, leading to a file open failure and causing the startup
process to PANIC.
Regards,
Pixian Shi
Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp> 于2024年9月30日周一 13:47写道:
> On Wed, 21 Aug 2024 09:11:03 +0800
> px shi <spxlyy123(at)gmail(dot)com> wrote:
>
> > Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> 于2024年8月21日周三 00:49写道:
> >
> > >
> > >
> > > > Is s1 a cascading standby of s2? If otherwise s1 and s2 is the
> standbys
> > > of
> > > > the primary server respectively, it is not surprising that s2 has
> > > progressed
> > > > far than s1 when the primary fails. I believe that this is the case
> you
> > > should
> > > > use pg_rewind. Even if flushedUpto is reset as proposed in your
> patch,
> > > s2 might
> > > > already have applied a WAL record that s1 has not processed yet, and
> > > there
> > > > would be no gurantee that subsecuent applys suceed.
> > >
> > >
> > Thank you for your response. In my scenario, s1 and s2 is the standbys
> of
> > the primary server respectively, and s1 a synchronous standby and s2 is
> an
> > asynchronous standby. You mentioned that if s2's replay progress is ahead
> > of s1, pg_rewind should be used. However, what I'm trying to address is
> an
> > issue where s2 crashes during replay after s1 has been promoted to
> primary,
> > even though s2's progress hasn't surpassed s1.
>
> I understood your point. It is odd that the standby server crashes when
> replication fails because the standby would keep retrying to get the
> next record even in such case.
>
> Regards,
> Yugo Nagata
>
> >
> > Regards,
> > Pixian Shi
>
>
> --
> Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Smith | 2024-09-30 07:26:06 | Re: Pgoutput not capturing the generated columns |
Previous Message | shveta malik | 2024-09-30 07:05:49 | Re: Conflict Detection and Resolution |