Re: pg_get_wal_replay_pause_state() should not return 'paused' while a promotion is ongoing.

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_get_wal_replay_pause_state() should not return 'paused' while a promotion is ongoing.
Date: 2021-05-18 08:13:36
Message-ID: 28e612b8-4335-6498-55c2-dc9b03a0f56f@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021/05/18 14:53, Dilip Kumar wrote:
> On Mon, May 17, 2021 at 7:59 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>>
>> If a promotion is triggered while recovery is paused, the paused state ends
>> and promotion continues. But currently pg_get_wal_replay_pause_state()
>> returns 'paused' in that case. Isn't this a bug?
>>
>> Attached patch fixes this issue by resetting the recovery pause state to
>> 'not paused' when standby promotion is triggered.
>>
>> Thought?
>>
>
> I think, prior to commit 496ee647ecd2917369ffcf1eaa0b2cdca07c8730
> (Prefer standby promotion over recovery pause.) this behavior was fine
> because the pause was continued but after this commit now we are
> giving preference to pause so this is a bug so need to be fixed.
>
> The fix looks fine but I think along with this we should also return
> immediately from the pause loop if promotion is requested. Because if
> we recheck the recovery pause then someone can pause again and we will
> be in loop so better to exit as soon as promotion is requested, see
> attached patch. Should be applied along with your patch.

But this change can cause the recovery to continue with insufficient parameter
settings if a promotion is requested while the server is in the paused state
because of such invalid settings. This behavior seems not safe.
If this my understanding is right, the recovery should abort immediately
(i.e., FATAL error ""recovery aborted because of insufficient parameter settings"
should be thrown) if a promotion is requested in that case, like when
pg_wal_replay_resume() is executed in that case. Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Borisov 2021-05-18 08:18:15 Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
Previous Message Sandeep Thakkar 2021-05-18 08:13:21 Re: [PATCH v3 1/1] Fix detection of preadv/pwritev support for OSX.