From: | px shi <spxlyy123(at)gmail(dot)com> |
---|---|
To: | Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [Bug Fix]standby may crash when switching-over in certain special cases |
Date: | 2024-12-27 13:32:52 |
Message-ID: | CAAccyYLbDnZgaXNeU3dz7nO0qtJRAT9p=ud4UEukpgiEqwT1wA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I inserted the following code in walsender.c:2509(v15.8) to reproduce the
issue.
{
WalSnd *walsnd = MyWalSnd;
SpinLockAcquire(&walsnd->mutex);
if (walsnd->flush % wal_segment_size == 0 && walsnd->sentPtr ==
walsnd->flush && walsnd->flush > wal_segment_size)
{
for (int i = 0; i < max_wal_senders; i++)
{
WalSnd *walsnder = &WalSndCtl->walsnds[i];
if (walsnder->pid == walsnd->pid)
continue;
SpinLockAcquire(&walsnder->mutex);
if (walsnder->pid == 0)
{
SpinLockRelease(&walsnder->mutex);
continue;
}
if (walsnder->flush % wal_segment_size == 0 &&
walsnder->flush > wal_segment_size)
elog(PANIC, "simulate primary crashed,
flushedlsn %X/%X", LSN_FORMAT_ARGS(walsnder->flush));
SpinLockRelease(&walsnder->mutex);
}
}
SpinLockRelease(&walsnd->mutex);
}
Regards,
Pixian Shi
px shi <spxlyy123(at)gmail(dot)com> 于2024年10月10日周四 15:01写道:
> Although I've not tested it because I don't have good way to reproduce
>> the problem
>
>
>
> I use GDB to reproduce the issue by killing the primary node with kill -9
> when the standby’s flushedlsn was at the begining of a WAL segment.
>
>
> However, this causes to request the primary to stream data from the
>> requested location again, even if the standby has already valid data. It
>> could be wasteful, for example, when applying WAL record is delayed due to
>> conflict with recovery or recovery_min_apply_delay.
>
>
> I had overlooked this problem, thanks for pointing that out.
>
> I think that comparing walrcv->receiveStart with recptr would be a better
> approach. Specifically, if walrcv->receiveStart is greater than recptr, then
> reset walrcv->flushedUpto. This way, we can prevent fetching WAL data from
> the primary when the standby already has valid data.A new patch has been
> submitted.
>
>
> Regards,
>
> Pixian Shi
>
>
> Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp> 于2024年10月9日周三 14:46写道:
>
>> On Mon, 30 Sep 2024 15:14:54 +0800
>> px shi <spxlyy123(at)gmail(dot)com> wrote:
>>
>> > Thanks for responding.
>> >
>> >
>> > > It is odd that the standby server crashes when
>> >
>> > replication fails because the standby would keep retrying to get the
>> >
>> > next record even in such case.
>> >
>> >
>> > As I mentioned earlier, when replication fails, it retries to establish
>> > streaming replication. At this point, the value of *walrcv->flushedUpto
>> *is
>> > not necessarily the data actually flushed to disk. However, the startup
>> > process mistakenly believes that the latest flushed LSN is
>> > *walrcv->flushedUpto* and attempts to open the corresponding WAL file,
>> > which doesn't exist, leading to a file open failure and causing the
>> startup
>> > process to PANIC.
>>
>> Although I've not tested it because I don't have good way to reproduce
>> the problem,
>> the patch seems to fix the issue by rewinding walrcv->flushedUpto to the
>> requested
>> WAL location always when the streaming is requested to start. However,
>> this causes
>> to request the primary to stream data from the requested location again,
>> even if
>> the standby has already valid data. It could be wasteful, for example,
>> when applying WAL
>> record is delayed due to conflict with recovery or
>> recovery_min_apply_delay.
>>
>> It might be better if if could notice that there is not requested record
>> in the primary's
>> timeline before requesting the streaming, but I don't have a good
>> solution for now.
>>
>> Regards,
>> Yugo Nagata
>>
>> > Regards,
>> > Pixian Shi
>> >
>> > Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp> 于2024年9月30日周一 13:47写道:
>> >
>> > > On Wed, 21 Aug 2024 09:11:03 +0800
>> > > px shi <spxlyy123(at)gmail(dot)com> wrote:
>> > >
>> > > > Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> 于2024年8月21日周三 00:49写道:
>> > > >
>> > > > >
>> > > > >
>> > > > > > Is s1 a cascading standby of s2? If otherwise s1 and s2 is the
>> > > standbys
>> > > > > of
>> > > > > > the primary server respectively, it is not surprising that s2
>> has
>> > > > > progressed
>> > > > > > far than s1 when the primary fails. I believe that this is the
>> case
>> > > you
>> > > > > should
>> > > > > > use pg_rewind. Even if flushedUpto is reset as proposed in your
>> > > patch,
>> > > > > s2 might
>> > > > > > already have applied a WAL record that s1 has not processed
>> yet, and
>> > > > > there
>> > > > > > would be no gurantee that subsecuent applys suceed.
>> > > > >
>> > > > >
>> > > > Thank you for your response. In my scenario, s1 and s2 is the
>> standbys
>> > > of
>> > > > the primary server respectively, and s1 a synchronous standby and
>> s2 is
>> > > an
>> > > > asynchronous standby. You mentioned that if s2's replay progress is
>> ahead
>> > > > of s1, pg_rewind should be used. However, what I'm trying to
>> address is
>> > > an
>> > > > issue where s2 crashes during replay after s1 has been promoted to
>> > > primary,
>> > > > even though s2's progress hasn't surpassed s1.
>> > >
>> > > I understood your point. It is odd that the standby server crashes
>> when
>> > > replication fails because the standby would keep retrying to get the
>> > > next record even in such case.
>> > >
>> > > Regards,
>> > > Yugo Nagata
>> > >
>> > > >
>> > > > Regards,
>> > > > Pixian Shi
>> > >
>> > >
>> > > --
>> > > Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
>> > >
>>
>>
>> --
>> Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
>>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Gustafsson | 2024-12-27 14:09:48 | Remove support for OpenSSL *eay32 libs on Windows |
Previous Message | Daniel Gustafsson | 2024-12-27 13:22:32 | Re: add support for the old naming libs convention on windows (ssleay32.lib and libeay32.lib) |