From: | "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com> |
---|---|
To: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Add Information during standby recovery conflicts |
Date: | 2020-12-14 15:49:16 |
Message-ID: | 8c1fca5b-6c5c-df6f-0b6d-4702eeb11c5c@amazon.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 12/14/20 4:20 PM, Fujii Masao wrote:
> CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender
> and know the content is safe.
>
>
>
> On 2020/12/14 21:31, Fujii Masao wrote:
>>
>>
>> On 2020/12/05 12:38, Masahiko Sawada wrote:
>>> On Fri, Dec 4, 2020 at 7:22 PM Drouvot, Bertrand
>>> <bdrouvot(at)amazon(dot)com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On 12/4/20 2:21 AM, Fujii Masao wrote:
>>>>>
>>>>> On 2020/12/04 9:28, Masahiko Sawada wrote:
>>>>>> On Fri, Dec 4, 2020 at 2:54 AM Fujii Masao
>>>>>> <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2020/12/01 17:29, Drouvot, Bertrand wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On 12/1/20 12:35 AM, Masahiko Sawada wrote:
>>>>>>>>> CAUTION: This email originated from outside of the organization.
>>>>>>>>> Do not click links or open attachments unless you can confirm the
>>>>>>>>> sender and know the content is safe.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Dec 1, 2020 at 3:25 AM Alvaro Herrera
>>>>>>>>> <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>>>>>>>>>> On 2020-Dec-01, Fujii Masao wrote:
>>>>>>>>>>
>>>>>>>>>>> + if (proc)
>>>>>>>>>>> + {
>>>>>>>>>>> + if (nprocs == 0)
>>>>>>>>>>> + appendStringInfo(&buf, "%d", proc->pid);
>>>>>>>>>>> + else
>>>>>>>>>>> + appendStringInfo(&buf, ", %d", proc->pid);
>>>>>>>>>>> +
>>>>>>>>>>> + nprocs++;
>>>>>>>>>>>
>>>>>>>>>>> What happens if all the backends in wait_list have gone? In
>>>>>>>>>>> other words,
>>>>>>>>>>> how should we handle the case where nprocs == 0 (i.e., nprocs
>>>>>>>>>>> has not been
>>>>>>>>>>> incrmented at all)? This would very rarely happen, but can
>>>>>>>>>>> happen.
>>>>>>>>>>> In this case, since buf.data is empty, at least there seems no
>>>>>>>>>>> need to log
>>>>>>>>>>> the list of conflicting processes in detail message.
>>>>>>>>>> Yes, I noticed this too; this can be simplified by changing the
>>>>>>>>>> condition in the ereport() call to be "nprocs > 0" (rather than
>>>>>>>>>> wait_list being null), otherwise not print the errdetail. (You
>>>>>>>>>> could
>>>>>>>>>> test buf.data or buf.len instead, but that seems uglier to me.)
>>>>>>>>> +1
>>>>>>>>>
>>>>>>>>> Maybe we can also improve the comment of this function from:
>>>>>>>>>
>>>>>>>>> + * This function also reports the details about the conflicting
>>>>>>>>> + * process ids if *wait_list is not NULL.
>>>>>>>>>
>>>>>>>>> to " This function also reports the details about the conflicting
>>>>>>>>> process ids if exist" or something.
>>>>>>>>>
>>>>>>>> Thank you all for the review/remarks.
>>>>>>>>
>>>>>>>> They have been addressed in the new attached patch version.
>>>>>>>
>>>>>>> Thanks for updating the patch! I read through the patch again
>>>>>>> and applied the following chages to it. Attached is the updated
>>>>>>> version of the patch. Could you review this version? If there is
>>>>>>> no issue in it, I'm thinking to commit this version.
>>>>>>
>>>>>> Thank you for updating the patch! I have one question.
>>>>>>
>>>>>>>
>>>>>>> + timeouts[cnt].id = STANDBY_TIMEOUT;
>>>>>>> + timeouts[cnt].type = TMPARAM_AFTER;
>>>>>>> + timeouts[cnt].delay_ms = DeadlockTimeout;
>>>>>>>
>>>>>>> Maybe STANDBY_TIMEOUT should be STANDBY_DEADLOCK_TIMEOUT here?
>>>>>>> I changed the code that way.
>>>>>>
>>>>>> As the comment of ResolveRecoveryConflictWithLock() says the
>>>>>> following, a deadlock is detected by the ordinary backend process:
>>>>>>
>>>>>> * Deadlocks involving the Startup process and an ordinary backend
>>>>>> proces
>>>>>> * will be detected by the deadlock detector within the ordinary
>>>>>> backend.
>>>>>>
>>>>>> If we use STANDBY_DEADLOCK_TIMEOUT,
>>>>>> SendRecoveryConflictWithBufferPin() will be called after
>>>>>> DeadlockTimeout passed, but I think it's not necessary for the
>>>>>> startup
>>>>>> process in this case.
>>>>>
>>>>> Thanks for pointing this! You are right.
>>>>>
>>>>>
>>>>>> If we want to just wake up the startup process
>>>>>> maybe we can use STANDBY_TIMEOUT here?
>>>>>
>>>> Thanks for the patch updates! Except what we are still discussing
>>>> below,
>>>> it looks good to me.
>>>>
>>>>> When STANDBY_TIMEOUT happens, a request to release conflicting buffer
>>>>> pins is sent. Right? If so, we should not also use STANDBY_TIMEOUT
>>>>> there?
>>>>
>>>> Agree
>>>>
>>>>>
>>>>> Or, first of all, we don't need to enable the deadlock timer at all?
>>>>> Since what we'd like to do is to wake up after deadlock_timeout
>>>>> passes, we can do that by changing ProcWaitForSignal() so that it can
>>>>> accept the timeout and giving the deadlock_timeout to it. If we do
>>>>> this, maybe we can get rid of STANDBY_LOCK_TIMEOUT from
>>>>> ResolveRecoveryConflictWithLock(). Thought?
>>>
>>> Where do we enable deadlock timeout in hot standby case? You meant to
>>> enable it in ProcWaitForSignal() or where we set a timer for not hot
>>> standby case, in ProcSleep()?
>>
>> No, what I tried to say is to change
>> ResolveRecoveryConflictWithLock() so that it does
>>
>> 1. calculate the "minimum" timeout from deadlock_timeout and
>> max_standby_xxx_delay
>> 2. give the calculated timeout value to ProcWaitForSignal()
>> 3. wait for signal and timeout on ProcWaitForSignal()
>>
>> Since ProcWaitForSignal() calls WaitLatch(), seems it's not so
>> difficult to make ProcWaitForSignal() handle the timeout. If we do
>> this, I was thinking that we can get rid of enable_timeouts() from
>> ResolveRecoveryConflictWithLock().
>>
>>
>>>
>>>>
>>>> Why not simply use (again) the STANDBY_LOCK_TIMEOUT one? (as it
>>>> triggers
>>>> a call to StandbyLockTimeoutHandler() which does nothing, except
>>>> waking
>>>> up. That's what we want, right?)
>>>
>>> Right, what I wanted to mean is STANDBY_LOCK_TIMEOUT. The startup
>>> process can wake up and do nothing. Thank you for pointing out.
>>
>> Okay, understood! Firstly I was thinking that enabling the same type
>> (i.e., STANDBY_LOCK_TIMEOUT) of lock twice doesn't work properly, but
>> as far as I read the code, it works. In that case, only the shorter
>> timeout would be activated in enable_timeouts(). So I agree to use
>> STANDBY_LOCK_TIMEOUT.
>
> So I renamed the argument "deadlock_timer" in
> ResolveRecoveryConflictWithLock()
> because it's not the timer for deadlock and is confusing. Attached is the
> updated version of the patch. Barring any objection, I will commit
> this version.
Thanks for the update!
Indeed the naming is more appropriate and less confusing that way, this
version looks all good to me.
Thanks!
Bertrand
From | Date | Subject | |
---|---|---|---|
Next Message | Bharath Rupireddy | 2020-12-14 16:17:58 | Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit |
Previous Message | Fujii Masao | 2020-12-14 15:20:32 | Re: Add Information during standby recovery conflicts |