From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
Cc: | "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Add Information during standby recovery conflicts |
Date: | 2020-12-16 02:55:29 |
Message-ID: | CAD21AoA66PiAbVQ=i4X7z81yTxZc5R=G+Yi8Sny=KZBcCqJ7SA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Dec 14, 2020 at 9:31 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>
>
>
> On 2020/12/05 12:38, Masahiko Sawada wrote:
> > On Fri, Dec 4, 2020 at 7:22 PM Drouvot, Bertrand <bdrouvot(at)amazon(dot)com> wrote:
> >>
> >> Hi,
> >>
> >> On 12/4/20 2:21 AM, Fujii Masao wrote:
> >>>
> >>> On 2020/12/04 9:28, Masahiko Sawada wrote:
> >>>> On Fri, Dec 4, 2020 at 2:54 AM Fujii Masao
> >>>> <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 2020/12/01 17:29, Drouvot, Bertrand wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> On 12/1/20 12:35 AM, Masahiko Sawada wrote:
> >>>>>>> CAUTION: This email originated from outside of the organization.
> >>>>>>> Do not click links or open attachments unless you can confirm the
> >>>>>>> sender and know the content is safe.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Dec 1, 2020 at 3:25 AM Alvaro Herrera
> >>>>>>> <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> >>>>>>>> On 2020-Dec-01, Fujii Masao wrote:
> >>>>>>>>
> >>>>>>>>> + if (proc)
> >>>>>>>>> + {
> >>>>>>>>> + if (nprocs == 0)
> >>>>>>>>> + appendStringInfo(&buf, "%d", proc->pid);
> >>>>>>>>> + else
> >>>>>>>>> + appendStringInfo(&buf, ", %d", proc->pid);
> >>>>>>>>> +
> >>>>>>>>> + nprocs++;
> >>>>>>>>>
> >>>>>>>>> What happens if all the backends in wait_list have gone? In
> >>>>>>>>> other words,
> >>>>>>>>> how should we handle the case where nprocs == 0 (i.e., nprocs
> >>>>>>>>> has not been
> >>>>>>>>> incrmented at all)? This would very rarely happen, but can happen.
> >>>>>>>>> In this case, since buf.data is empty, at least there seems no
> >>>>>>>>> need to log
> >>>>>>>>> the list of conflicting processes in detail message.
> >>>>>>>> Yes, I noticed this too; this can be simplified by changing the
> >>>>>>>> condition in the ereport() call to be "nprocs > 0" (rather than
> >>>>>>>> wait_list being null), otherwise not print the errdetail. (You
> >>>>>>>> could
> >>>>>>>> test buf.data or buf.len instead, but that seems uglier to me.)
> >>>>>>> +1
> >>>>>>>
> >>>>>>> Maybe we can also improve the comment of this function from:
> >>>>>>>
> >>>>>>> + * This function also reports the details about the conflicting
> >>>>>>> + * process ids if *wait_list is not NULL.
> >>>>>>>
> >>>>>>> to " This function also reports the details about the conflicting
> >>>>>>> process ids if exist" or something.
> >>>>>>>
> >>>>>> Thank you all for the review/remarks.
> >>>>>>
> >>>>>> They have been addressed in the new attached patch version.
> >>>>>
> >>>>> Thanks for updating the patch! I read through the patch again
> >>>>> and applied the following chages to it. Attached is the updated
> >>>>> version of the patch. Could you review this version? If there is
> >>>>> no issue in it, I'm thinking to commit this version.
> >>>>
> >>>> Thank you for updating the patch! I have one question.
> >>>>
> >>>>>
> >>>>> + timeouts[cnt].id = STANDBY_TIMEOUT;
> >>>>> + timeouts[cnt].type = TMPARAM_AFTER;
> >>>>> + timeouts[cnt].delay_ms = DeadlockTimeout;
> >>>>>
> >>>>> Maybe STANDBY_TIMEOUT should be STANDBY_DEADLOCK_TIMEOUT here?
> >>>>> I changed the code that way.
> >>>>
> >>>> As the comment of ResolveRecoveryConflictWithLock() says the
> >>>> following, a deadlock is detected by the ordinary backend process:
> >>>>
> >>>> * Deadlocks involving the Startup process and an ordinary backend
> >>>> proces
> >>>> * will be detected by the deadlock detector within the ordinary
> >>>> backend.
> >>>>
> >>>> If we use STANDBY_DEADLOCK_TIMEOUT,
> >>>> SendRecoveryConflictWithBufferPin() will be called after
> >>>> DeadlockTimeout passed, but I think it's not necessary for the startup
> >>>> process in this case.
> >>>
> >>> Thanks for pointing this! You are right.
> >>>
> >>>
> >>>> If we want to just wake up the startup process
> >>>> maybe we can use STANDBY_TIMEOUT here?
> >>>
> >> Thanks for the patch updates! Except what we are still discussing below,
> >> it looks good to me.
> >>
> >>> When STANDBY_TIMEOUT happens, a request to release conflicting buffer
> >>> pins is sent. Right? If so, we should not also use STANDBY_TIMEOUT there?
> >>
> >> Agree
> >>
> >>>
> >>> Or, first of all, we don't need to enable the deadlock timer at all?
> >>> Since what we'd like to do is to wake up after deadlock_timeout
> >>> passes, we can do that by changing ProcWaitForSignal() so that it can
> >>> accept the timeout and giving the deadlock_timeout to it. If we do
> >>> this, maybe we can get rid of STANDBY_LOCK_TIMEOUT from
> >>> ResolveRecoveryConflictWithLock(). Thought?
> >
> > Where do we enable deadlock timeout in hot standby case? You meant to
> > enable it in ProcWaitForSignal() or where we set a timer for not hot
> > standby case, in ProcSleep()?
>
> No, what I tried to say is to change ResolveRecoveryConflictWithLock() so that it does
>
> 1. calculate the "minimum" timeout from deadlock_timeout and max_standby_xxx_delay
> 2. give the calculated timeout value to ProcWaitForSignal()
> 3. wait for signal and timeout on ProcWaitForSignal()
>
> Since ProcWaitForSignal() calls WaitLatch(), seems it's not so difficult to make ProcWaitForSignal() handle the timeout. If we do this, I was thinking that we can get rid of enable_timeouts() from ResolveRecoveryConflictWithLock().
Thank you for your explanation! That makes sense to me. Even if we
don't have ProcWaitForSignal() handler the timeout perhaps we don't
need to set two timeouts. As you mentioned, we can calculate the
minimum timeout and set it (or nothing).
Regards,
--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2020-12-16 03:08:31 | Re: Add Information during standby recovery conflicts |
Previous Message | Masahiko Sawada | 2020-12-16 02:43:24 | Re: xid wraparound danger due to INDEX_CLEANUP false |