On Mon, Feb 9, 2015 at 8:29 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Sun, Feb 8, 2015 at 2:54 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Fri, Feb 6, 2015 at 4:58 PM, Fujii Masao wrote:
>>> - * Wait for more WAL to arrive. Time out after 5 seconds,
>>> - * like when polling the archive, to react to a trigger
>>> - * file promptly.
>>> + * Wait for more WAL to arrive. Time out after
>>> the amount of
>>> + * time specified by wal_retrieve_retry_interval, like
>>> + * when polling the archive, to react to a
>>> trigger file promptly.
>>> */
>>> WaitLatch(&XLogCtl->recoveryWakeupLatch,
>>> WL_LATCH_SET | WL_TIMEOUT,
>>> - 5000L);
>>> + wal_retrieve_retry_interval * 1000L);
>>>
>>> This change can prevent the startup process from reacting to
>>> a trigger file. Imagine the case where the large interval is set
>>> and the user want to promote the standby by using the trigger file
>>> instead of pg_ctl promote. I think that the sleep time should be 5s
>>> if the interval is set to more than 5s. Thought?
>>
>> I disagree here. It is interesting to accelerate the check of WAL
>> availability from a source in some cases for replication, but the
>> opposite is true as well as mentioned by Alexey at the beginning of
>> the thread to reduce the number of requests when requesting WAL
>> archives from an external storage type AWS. Hence a correct solution
>> would be to check periodically for the trigger file with a maximum
>> one-time wait of 5s to ensure backward-compatible behavior. We could
>> reduce it to 1s or something like that as well.
>
> You seem to have misunderstood the code in question. Or I'm missing something.
> The timeout of the WaitLatch is just the interval to check for the trigger file
> while waiting for more WAL to arrive from streaming replication. Not related to
> the retry time to restore WAL from the archive.
[Re-reading the code...]
Aah.. Yes you are right. Sorry for the noise. Yes let's wait for a
maximum of 5s then. I also noticed in previous patch that the wait was
maximized to 5s. To begin with, a loop should have been used if it was
a sleep, but as now patch uses a latch this limit does not make much
sense... Patch updated is attached.
Regards,
--
Michael