| From: | px shi <spxlyy123(at)gmail(dot)com> | 
|---|---|
| To: | Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp> | 
| Cc: | pgsql-hackers(at)postgresql(dot)org | 
| Subject: | Re: [Bug Fix]standby may crash when switching-over in certain special cases | 
| Date: | 2024-10-10 07:01:31 | 
| Message-ID: | CAAccyYKzMZiq5xOjUkjhvxth8V+yiNuyX=9xcnLM_yBM9oYWFQ@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
>
>  Although I've not tested it because I don't have good way to reproduce
> the problem
I use GDB to reproduce the issue by killing the primary node with kill -9
when the standby’s flushedlsn was at the begining of a WAL segment.
However, this causes to request the primary to stream data from the
> requested location again, even if the standby has already valid data. It
> could be wasteful, for example, when applying WAL record is delayed due to
> conflict with recovery or recovery_min_apply_delay.
I had overlooked this problem, thanks for pointing that out.
I think that comparing walrcv->receiveStart with recptr would be a better
approach. Specifically, if walrcv->receiveStart is greater than recptr,  then
reset walrcv->flushedUpto. This way, we can prevent fetching WAL data from
the primary when the standby already has valid data.A new patch has been
submitted.
Regards,
Pixian Shi
Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp> 于2024年10月9日周三 14:46写道:
> On Mon, 30 Sep 2024 15:14:54 +0800
> px shi <spxlyy123(at)gmail(dot)com> wrote:
>
> > Thanks for responding.
> >
> >
> > > It is odd that the standby server crashes when
> >
> > replication fails because the standby would keep retrying to get the
> >
> > next record even in such case.
> >
> >
> >  As I mentioned earlier, when replication fails, it retries to establish
> > streaming replication. At this point, the value of *walrcv->flushedUpto
> *is
> > not necessarily the data actually flushed to disk. However, the startup
> > process mistakenly believes that the latest flushed LSN is
> > *walrcv->flushedUpto* and attempts to open the corresponding WAL file,
> > which doesn't exist, leading to a file open failure and causing the
> startup
> > process to PANIC.
>
> Although I've not tested it because I don't have good way to reproduce the
> problem,
> the patch seems to fix the issue by rewinding walrcv->flushedUpto to the
> requested
> WAL location always when the streaming is requested to start. However,
> this causes
> to request the primary to stream data from the requested location again,
> even if
> the standby has already valid data. It could be wasteful, for example,
> when applying WAL
> record is delayed due to conflict with recovery or
> recovery_min_apply_delay.
>
> It might be better if if could notice that there is not requested record
> in the primary's
> timeline before requesting the streaming, but I don't have a good solution
> for now.
>
> Regards,
> Yugo Nagata
>
> > Regards,
> > Pixian Shi
> >
> > Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp> 于2024年9月30日周一 13:47写道:
> >
> > > On Wed, 21 Aug 2024 09:11:03 +0800
> > > px shi <spxlyy123(at)gmail(dot)com> wrote:
> > >
> > > > Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> 于2024年8月21日周三 00:49写道:
> > > >
> > > > >
> > > > >
> > > > > > Is s1 a cascading standby of s2? If otherwise s1 and s2 is the
> > > standbys
> > > > > of
> > > > > > the primary server respectively, it is not surprising that s2 has
> > > > > progressed
> > > > > > far than s1 when the primary fails. I believe that this is the
> case
> > > you
> > > > > should
> > > > > > use pg_rewind. Even if flushedUpto is reset as proposed in your
> > > patch,
> > > > > s2 might
> > > > > > already have applied a WAL record that s1 has not processed yet,
> and
> > > > > there
> > > > > > would be no gurantee that subsecuent applys suceed.
> > > > >
> > > > >
> > > >  Thank you for your response. In my scenario, s1 and s2 is the
> standbys
> > > of
> > > > the primary server respectively, and s1 a synchronous standby and s2
> is
> > > an
> > > > asynchronous standby. You mentioned that if s2's replay progress is
> ahead
> > > > of s1, pg_rewind should be used. However, what I'm trying to address
> is
> > > an
> > > > issue where s2 crashes during replay after s1 has been promoted to
> > > primary,
> > > > even though s2's progress hasn't surpassed s1.
> > >
> > > I understood your point. It is odd that the standby server crashes when
> > > replication fails because the standby would keep retrying to get the
> > > next record even in such case.
> > >
> > > Regards,
> > > Yugo Nagata
> > >
> > > >
> > > > Regards,
> > > > Pixian Shi
> > >
> > >
> > > --
> > > Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
> > >
>
>
> --
> Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
>
| Attachment | Content-Type | Size | 
|---|---|---|
| v2-0001-Fix-walreceiver-set-incorrect-flushedUpto-when-sw.patch | application/octet-stream | 1.3 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Masahiko Sawada | 2024-10-10 07:05:10 | Re: Add contrib/pg_logicalsnapinspect | 
| Previous Message | Tatsuo Ishii | 2024-10-10 07:00:41 | Re: Doc: typo in config.sgml |