From: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Delay of standby shutdown |
Date: | 2020-11-12 03:24:48 |
Message-ID: | 87b9d893-101e-664f-270e-6029b1fe7a0c@oss.nttdata.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2020/11/04 9:35, Soumyadeep Chakraborty wrote:
> Hello Fujii,
>
> On Thu, Sep 17, 2020 at 6:49 AM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>
> As far as I understand after debugging, the problem is as follows:
Yes.
>
> 1. After the primary is stopped, and the standby startup process is
> waiting inside:
>
> (void) WaitLatch(&XLogCtl->recoveryWakeupLatch,
> WL_LATCH_SET | WL_TIMEOUT |
> WL_EXIT_ON_PM_DEATH,
> wait_time,
> WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL);
>
> it receives SIGTERM on account of stopping the standby and it leads to
> the WaitLatch call returning with WL_LATCH_SET.
>
> 2. WaitForWALToBecomeAvailable() then will return true after calling
> XLogFileReadAnyTLI() and eventually, XLogReadRecord() will return NULL
> since there is no new WAL to read, which means ReadRecord() will loop
> back and perform another XLogReadRecord().
>
> 3. The additional XLogReadRecord() will lead to a 5s wait inside
> WaitForWALToBecomeAvailable() - a different WaitLatch() call this time:
>
> /*
> * Wait for more WAL to arrive. Time out after 5 seconds
> * to react to a trigger file promptly and to check if the
> * WAL receiver is still active.
> */
> (void) WaitLatch(&XLogCtl->recoveryWakeupLatch,
> WL_LATCH_SET | WL_TIMEOUT |
> WL_EXIT_ON_PM_DEATH,
> 5000L, WAIT_EVENT_RECOVERY_WAL_STREAM);
>
> 4. And then eventually, the code will handle interrupts here inside
> WaitForWALToBecomeAvailable() after the 5s wait:
>
> /*
> * This possibly-long loop needs to handle interrupts of startup
> * process.
> */
> HandleStartupProcInterrupts();
>
>> To avoid this issue, I think that ReadRecord() should call
>> HandleStartupProcInterrupts() whenever looping back to retry.
>
> Alternatively, perhaps we can place it inside
> WaitForWALToBecomeAvailable() (which already has a similar call),
> since it is more semantically aligned to checking for interrupts, rather
> than ReadRecord()? Like this:
Yes. Your approach looks better to me.
Attached is the updated version of the patch implementing that.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachment | Content-Type | Size |
---|---|---|
shutdown_in_standby_v2.patch | text/plain | 583 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Korotkov | 2020-11-12 03:48:17 | Re: Strange GiST logic leading to uninitialized memory access in pg_trgm gist code |
Previous Message | Amit Kapila | 2020-11-12 03:16:55 | Re: logical streaming of xacts via test_decoding is broken |