Re: Add Information during standby recovery conflicts

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add Information during standby recovery conflicts
Date: 2020-12-04 01:21:35
Message-ID: 627f702d-a04c-b72b-b94f-214f448e2935@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020/12/04 9:28, Masahiko Sawada wrote:
> On Fri, Dec 4, 2020 at 2:54 AM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>>
>>
>>
>> On 2020/12/01 17:29, Drouvot, Bertrand wrote:
>>> Hi,
>>>
>>> On 12/1/20 12:35 AM, Masahiko Sawada wrote:
>>>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>>>
>>>>
>>>>
>>>> On Tue, Dec 1, 2020 at 3:25 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>>>>> On 2020-Dec-01, Fujii Masao wrote:
>>>>>
>>>>>> + if (proc)
>>>>>> + {
>>>>>> + if (nprocs == 0)
>>>>>> + appendStringInfo(&buf, "%d", proc->pid);
>>>>>> + else
>>>>>> + appendStringInfo(&buf, ", %d", proc->pid);
>>>>>> +
>>>>>> + nprocs++;
>>>>>>
>>>>>> What happens if all the backends in wait_list have gone? In other words,
>>>>>> how should we handle the case where nprocs == 0 (i.e., nprocs has not been
>>>>>> incrmented at all)? This would very rarely happen, but can happen.
>>>>>> In this case, since buf.data is empty, at least there seems no need to log
>>>>>> the list of conflicting processes in detail message.
>>>>> Yes, I noticed this too; this can be simplified by changing the
>>>>> condition in the ereport() call to be "nprocs > 0" (rather than
>>>>> wait_list being null), otherwise not print the errdetail. (You could
>>>>> test buf.data or buf.len instead, but that seems uglier to me.)
>>>> +1
>>>>
>>>> Maybe we can also improve the comment of this function from:
>>>>
>>>> + * This function also reports the details about the conflicting
>>>> + * process ids if *wait_list is not NULL.
>>>>
>>>> to " This function also reports the details about the conflicting
>>>> process ids if exist" or something.
>>>>
>>> Thank you all for the review/remarks.
>>>
>>> They have been addressed in the new attached patch version.
>>
>> Thanks for updating the patch! I read through the patch again
>> and applied the following chages to it. Attached is the updated
>> version of the patch. Could you review this version? If there is
>> no issue in it, I'm thinking to commit this version.
>
> Thank you for updating the patch! I have one question.
>
>>
>> + timeouts[cnt].id = STANDBY_TIMEOUT;
>> + timeouts[cnt].type = TMPARAM_AFTER;
>> + timeouts[cnt].delay_ms = DeadlockTimeout;
>>
>> Maybe STANDBY_TIMEOUT should be STANDBY_DEADLOCK_TIMEOUT here?
>> I changed the code that way.
>
> As the comment of ResolveRecoveryConflictWithLock() says the
> following, a deadlock is detected by the ordinary backend process:
>
> * Deadlocks involving the Startup process and an ordinary backend proces
> * will be detected by the deadlock detector within the ordinary backend.
>
> If we use STANDBY_DEADLOCK_TIMEOUT,
> SendRecoveryConflictWithBufferPin() will be called after
> DeadlockTimeout passed, but I think it's not necessary for the startup
> process in this case.

Thanks for pointing this! You are right.

> If we want to just wake up the startup process
> maybe we can use STANDBY_TIMEOUT here?

When STANDBY_TIMEOUT happens, a request to release conflicting buffer pins is sent. Right? If so, we should not also use STANDBY_TIMEOUT there?

Or, first of all, we don't need to enable the deadlock timer at all? Since what we'd like to do is to wake up after deadlock_timeout passes, we can do that by changing ProcWaitForSignal() so that it can accept the timeout and giving the deadlock_timeout to it. If we do this, maybe we can get rid of STANDBY_LOCK_TIMEOUT from ResolveRecoveryConflictWithLock(). Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-12-04 01:25:43 Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
Previous Message Jacob Champion 2020-12-04 00:57:26 Re: Support for NSS as a libpq TLS backend