| From: | "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com> |
|---|---|
| To: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
| Cc: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Add Information during standby recovery conflicts |
| Date: | 2021-01-07 13:39:50 |
| Message-ID: | 5517bbfd-27ad-3e97-51e2-cb9ffb8f8a8d@amazon.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On 1/6/21 6:31 PM, Fujii Masao wrote:
> CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender
> and know the content is safe.
>
>
>
> On 2020/12/15 0:20, Fujii Masao wrote:
>>
>>
>> On 2020/12/14 21:31, Fujii Masao wrote:
>>>
>>>
>>> On 2020/12/05 12:38, Masahiko Sawada wrote:
>>>> On Fri, Dec 4, 2020 at 7:22 PM Drouvot, Bertrand
>>>> <bdrouvot(at)amazon(dot)com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> On 12/4/20 2:21 AM, Fujii Masao wrote:
>>>>>>
>>>>>> On 2020/12/04 9:28, Masahiko Sawada wrote:
>>>>>>> On Fri, Dec 4, 2020 at 2:54 AM Fujii Masao
>>>>>>> <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2020/12/01 17:29, Drouvot, Bertrand wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> On 12/1/20 12:35 AM, Masahiko Sawada wrote:
>>>>>>>>>> CAUTION: This email originated from outside of the organization.
>>>>>>>>>> Do not click links or open attachments unless you can confirm
>>>>>>>>>> the
>>>>>>>>>> sender and know the content is safe.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Dec 1, 2020 at 3:25 AM Alvaro Herrera
>>>>>>>>>> <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>>>>>>>>>>> On 2020-Dec-01, Fujii Masao wrote:
>>>>>>>>>>>
>>>>>>>>>>>> + if (proc)
>>>>>>>>>>>> + {
>>>>>>>>>>>> + if (nprocs == 0)
>>>>>>>>>>>> + appendStringInfo(&buf, "%d", proc->pid);
>>>>>>>>>>>> + else
>>>>>>>>>>>> + appendStringInfo(&buf, ", %d", proc->pid);
>>>>>>>>>>>> +
>>>>>>>>>>>> + nprocs++;
>>>>>>>>>>>>
>>>>>>>>>>>> What happens if all the backends in wait_list have gone? In
>>>>>>>>>>>> other words,
>>>>>>>>>>>> how should we handle the case where nprocs == 0 (i.e., nprocs
>>>>>>>>>>>> has not been
>>>>>>>>>>>> incrmented at all)? This would very rarely happen, but can
>>>>>>>>>>>> happen.
>>>>>>>>>>>> In this case, since buf.data is empty, at least there seems no
>>>>>>>>>>>> need to log
>>>>>>>>>>>> the list of conflicting processes in detail message.
>>>>>>>>>>> Yes, I noticed this too; this can be simplified by changing the
>>>>>>>>>>> condition in the ereport() call to be "nprocs > 0" (rather than
>>>>>>>>>>> wait_list being null), otherwise not print the errdetail. (You
>>>>>>>>>>> could
>>>>>>>>>>> test buf.data or buf.len instead, but that seems uglier to me.)
>>>>>>>>>> +1
>>>>>>>>>>
>>>>>>>>>> Maybe we can also improve the comment of this function from:
>>>>>>>>>>
>>>>>>>>>> + * This function also reports the details about the conflicting
>>>>>>>>>> + * process ids if *wait_list is not NULL.
>>>>>>>>>>
>>>>>>>>>> to " This function also reports the details about the
>>>>>>>>>> conflicting
>>>>>>>>>> process ids if exist" or something.
>>>>>>>>>>
>>>>>>>>> Thank you all for the review/remarks.
>>>>>>>>>
>>>>>>>>> They have been addressed in the new attached patch version.
>>>>>>>>
>>>>>>>> Thanks for updating the patch! I read through the patch again
>>>>>>>> and applied the following chages to it. Attached is the updated
>>>>>>>> version of the patch. Could you review this version? If there is
>>>>>>>> no issue in it, I'm thinking to commit this version.
>>>>>>>
>>>>>>> Thank you for updating the patch! I have one question.
>>>>>>>
>>>>>>>>
>>>>>>>> + timeouts[cnt].id = STANDBY_TIMEOUT;
>>>>>>>> + timeouts[cnt].type = TMPARAM_AFTER;
>>>>>>>> + timeouts[cnt].delay_ms = DeadlockTimeout;
>>>>>>>>
>>>>>>>> Maybe STANDBY_TIMEOUT should be STANDBY_DEADLOCK_TIMEOUT here?
>>>>>>>> I changed the code that way.
>>>>>>>
>>>>>>> As the comment of ResolveRecoveryConflictWithLock() says the
>>>>>>> following, a deadlock is detected by the ordinary backend process:
>>>>>>>
>>>>>>> * Deadlocks involving the Startup process and an ordinary
>>>>>>> backend
>>>>>>> proces
>>>>>>> * will be detected by the deadlock detector within the ordinary
>>>>>>> backend.
>>>>>>>
>>>>>>> If we use STANDBY_DEADLOCK_TIMEOUT,
>>>>>>> SendRecoveryConflictWithBufferPin() will be called after
>>>>>>> DeadlockTimeout passed, but I think it's not necessary for the
>>>>>>> startup
>>>>>>> process in this case.
>>>>>>
>>>>>> Thanks for pointing this! You are right.
>>>>>>
>>>>>>
>>>>>>> If we want to just wake up the startup process
>>>>>>> maybe we can use STANDBY_TIMEOUT here?
>>>>>>
>>>>> Thanks for the patch updates! Except what we are still discussing
>>>>> below,
>>>>> it looks good to me.
>>>>>
>>>>>> When STANDBY_TIMEOUT happens, a request to release conflicting
>>>>>> buffer
>>>>>> pins is sent. Right? If so, we should not also use
>>>>>> STANDBY_TIMEOUT there?
>>>>>
>>>>> Agree
>>>>>
>>>>>>
>>>>>> Or, first of all, we don't need to enable the deadlock timer at all?
>>>>>> Since what we'd like to do is to wake up after deadlock_timeout
>>>>>> passes, we can do that by changing ProcWaitForSignal() so that it
>>>>>> can
>>>>>> accept the timeout and giving the deadlock_timeout to it. If we do
>>>>>> this, maybe we can get rid of STANDBY_LOCK_TIMEOUT from
>>>>>> ResolveRecoveryConflictWithLock(). Thought?
>>>>
>>>> Where do we enable deadlock timeout in hot standby case? You meant to
>>>> enable it in ProcWaitForSignal() or where we set a timer for not hot
>>>> standby case, in ProcSleep()?
>>>
>>> No, what I tried to say is to change
>>> ResolveRecoveryConflictWithLock() so that it does
>>>
>>> 1. calculate the "minimum" timeout from deadlock_timeout and
>>> max_standby_xxx_delay
>>> 2. give the calculated timeout value to ProcWaitForSignal()
>>> 3. wait for signal and timeout on ProcWaitForSignal()
>>>
>>> Since ProcWaitForSignal() calls WaitLatch(), seems it's not so
>>> difficult to make ProcWaitForSignal() handle the timeout. If we do
>>> this, I was thinking that we can get rid of enable_timeouts() from
>>> ResolveRecoveryConflictWithLock().
>>>
>>>
>>>>
>>>>>
>>>>> Why not simply use (again) the STANDBY_LOCK_TIMEOUT one? (as it
>>>>> triggers
>>>>> a call to StandbyLockTimeoutHandler() which does nothing, except
>>>>> waking
>>>>> up. That's what we want, right?)
>>>>
>>>> Right, what I wanted to mean is STANDBY_LOCK_TIMEOUT. The startup
>>>> process can wake up and do nothing. Thank you for pointing out.
>>>
>>> Okay, understood! Firstly I was thinking that enabling the same type
>>> (i.e., STANDBY_LOCK_TIMEOUT) of lock twice doesn't work properly,
>>> but as far as I read the code, it works. In that case, only the
>>> shorter timeout would be activated in enable_timeouts(). So I agree
>>> to use STANDBY_LOCK_TIMEOUT.
>>
>> So I renamed the argument "deadlock_timer" in
>> ResolveRecoveryConflictWithLock()
>> because it's not the timer for deadlock and is confusing. Attached is
>> the
>> updated version of the patch. Barring any objection, I will commit
>> this version.
>
> Since the recent commit 8900b5a9d5 changed the recovery conflict code,
> I updated the patch. Attached is the updated version of the patch.
>
Thanks for those updates!
I had a look and the patch does look good to me.
As far the other threads regarding:
- "maybe_log_conflict" and "maybe_update_title" naming: I don’t have
strong opinions about it but I am more inclined to stay with the “maybe”
naming (as it is currently in this patch version) as it better reflects
that this may or not occur.
- the errdetail log message format in LogRecoveryConflict() (currently
looks like “Conflicting process: 25118.”) : I don’t have strong opinions
about it but I am more inclined to stay as it is, as it looks similar as
the format being used in ProcSleep() (even if we can find different
formats in other places though).
Bertrand
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Masahiko Sawada | 2021-01-07 13:48:09 | Re: pgbench stopped supporting large number of client connections on Windows |
| Previous Message | Josef Šimánek | 2021-01-07 13:32:12 | Re: [PATCH] Simple progress reporting for COPY command |