Re: pgsql: Get rid of the dedicated latch for signaling the startup process

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Fujii Masao <fujii(at)postgresql(dot)org>, pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: Re: pgsql: Get rid of the dedicated latch for signaling the startup process
Date: 2020-11-04 12:03:03
Message-ID: d1260696-30a2-f386-633d-dafd6bc46bd9@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

On 2020/11/04 20:20, Fujii Masao wrote:
>
>
> On 2020/11/04 19:27, Heikki Linnakangas wrote:
>> On 04/11/2020 09:44, Fujii Masao wrote:
>>> Get rid of the dedicated latch for signaling the startup process.
>>>
>>> This commit gets rid of the dedicated latch for signaling the startup
>>> process in favor of using its procLatch,  since that comports better
>>> with possible generic signal handlers using that latch.
>>>
>>> Commit 1e53fe0e70 changed background processes so that they use standard
>>> SIGHUP handler. Like that, this commit also makes the startup process use
>>> standard SIGHUP handler to simplify the code.
>>
>> This seems to have made buildfarm member 'elver' to segfault. I've got a hunch that setting recoveryWakeupLatch to NULL, when WakeupRecovery() doesn't check for NULL, is not OK. It's surprising that we're only seeing this on 'elver', though.
>
> Thanks for the report! The latch is reset to NULL after ShutdownWalRcv(). So I thought that there are no processes setting the latch (i.e., walreceiver has already exited) after it's reset to NULL. But this is not true. There seem a window between ShutdownWalRcv() and the actual exit of walreceiver. If a signal is sent during that window, the segmentation fault would happen. This would be the reason why segv happened in some platforms, but not in others.
>
> I'm thinking to remove the following code to fix this issue. Thought?
>
>     /*
>      * We don't need the latch anymore. It's not strictly necessary to reset
>      * it to NULL, but let's do it for the sake of tidiness.
>      */
>     if (ArchiveRecoveryRequested)
>         XLogCtl->recoveryWakeupLatch = NULL;

Or ISTM that WakeupRecovery() should set the latch only when the latch
has not been reset to NULL yet.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Fujii Masao 2020-11-04 12:51:53 pgsql: Fix segmentation fault that commit ac22929a26 caused.
Previous Message Peter Eisentraut 2020-11-04 11:50:18 pgsql: Enable hash partitioning of text arrays