Re: Add Information during standby recovery conflicts

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add Information during standby recovery conflicts
Date: 2020-11-27 05:04:51
Message-ID: CAD21AoANgC3xiAbgw4emfORBoFR8U1uptkYshi8k8SbyzTK0NQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 26, 2020 at 12:49 AM Drouvot, Bertrand <bdrouvot(at)amazon(dot)com> wrote:
>
> Hi,
>
> On 11/25/20 2:20 PM, Masahiko Sawada wrote:
> > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >
> >
> >
> > On Fri, Nov 20, 2020 at 6:18 PM Drouvot, Bertrand <bdrouvot(at)amazon(dot)com> wrote:
> >> Hi,
> >>
> >> On 11/17/20 4:44 PM, Fujii Masao wrote:
> >>> Thanks for updating the patch! Here are review comments.
> >>>
> >>> + Controls whether a log message is produced when the startup
> >>> process
> >>> + is waiting longer than <varname>deadlock_timeout</varname>
> >>> + for recovery conflicts.
> >>>
> >>> But a log message can be produced also when the backend is waiting
> >>> for recovery conflict. Right? If yes, this description needs to be
> >>> corrected.
> >> Thanks for looking at it!
> >>
> >> I don't think so, only the startup process should write those new log
> >> messages.
> >>
> >> What makes you think that would not be the case?
> >>
> >>>
> >>> + for recovery conflicts. This is useful in determining if
> >>> recovery
> >>> + conflicts prevents the recovery from applying WAL.
> >>>
> >>> "prevents" should be "prevent"?
> >> Indeed: fixed in the new attached patch.
> >>
> >>>
> >>> + TimestampDifference(waitStart, GetCurrentTimestamp(), &secs,
> >>> &usecs);
> >>> + msecs = secs * 1000 + usecs / 1000;
> >>>
> >>> GetCurrentTimestamp() is basically called before LogRecoveryConflict()
> >>> is called. So isn't it better to avoid calling GetCurrentTimestamp()
> >>> newly in
> >>> LogRecoveryConflict() and to reuse the timestamp that we got?
> >>> It's helpful to avoid the waste of cycles.
> >>>
> >> good catch! fixed in the new attached patch.
> >>
> >>> + while (VirtualTransactionIdIsValid(*vxids))
> >>> + {
> >>> + PGPROC *proc =
> >>> BackendIdGetProc(vxids->backendId);
> >>>
> >>> BackendIdGetProc() can return NULL if the backend is not active
> >>> at that moment. This case should be handled.
> >>>
> >> handled in the new attached patch.
> >>> + case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
> >>> + reasonDesc = gettext_noop("recovery is still
> >>> waiting recovery conflict on buffer pin");
> >>>
> >>> It's natural to use "waiting for recovery" rather than "waiting
> >>> recovery"?
> >>>
> >> I would be tempted to say so, the new patch makes use of "waiting for".
> >>> + /* Also, set deadlock timeout for logging purpose if
> >>> necessary */
> >>> + if (log_recovery_conflict_waits)
> >>> + {
> >>> + timeouts[cnt].id = STANDBY_TIMEOUT;
> >>> + timeouts[cnt].type = TMPARAM_AFTER;
> >>> + timeouts[cnt].delay_ms = DeadlockTimeout;
> >>> + cnt++;
> >>> + }
> >>>
> >>> This needs to be executed only when the message has not been logged yet.
> >>> Right?
> >>>
> >> good catch: fixed in the new attached patch.
> >>
> > Thank you for updating the patch! Here are review comments on the
> > latest version patch.
> >
> > + while (VirtualTransactionIdIsValid(*vxids))
> > + {
> > + PGPROC *proc = BackendIdGetProc(vxids->backendId);
> > +
> > + if (proc)
> > + {
> > + if (nprocs == 0)
> > + appendStringInfo(&buf, "%d", proc->pid);
> > + else
> > + appendStringInfo(&buf, ", %d", proc->pid);
> > +
> > + nprocs++;
> > + vxids++;
> > + }
> > + }
> >
> > We need to increment vxids even if *proc is null. Otherwise, the loop won't end.
>
> My bad, that's fixed.
>
> >
> > ---
> > + TimestampTz cur_ts = GetCurrentTimestamp();;
> Fixed
> >
> > There is an extra semi-colon.
> >
> > ---
> > int max_standby_streaming_delay = 30 * 1000;
> > +bool log_recovery_conflict_waits = false;
> > +bool logged_lock_conflict = false;
> >
> >
> > + if (log_recovery_conflict_waits && !logged_lock_conflict)
> > + {
> > + timeouts[cnt].id = STANDBY_TIMEOUT;
> > + timeouts[cnt].type = TMPARAM_AFTER;
> > + timeouts[cnt].delay_ms = DeadlockTimeout;
> > + cnt++;
> > + }
> >
> > Can we pass a bool indicating if a timeout may be needed for recovery
> > conflict logging from ProcSleep() to ResolveRecoveryConflictWithLock()
> > instead of using a static variable?
>
> Yeah that makes more sense, specially as we already have
> logged_recovery_conflict at our disposal.
>
> New patch version attached.
>

Thank you for updating the patch! The patch works fine and looks good
to me except for the following small comments:

+/*
+ * Log the recovery conflict.
+ *
+ * waitStart is the timestamp when the caller started to wait. This
function also
+ * reports the details about the conflicting process ids if *waitlist
is not NULL.
+ */
+void
+LogRecoveryConflict(ProcSignalReason reason, TimestampTz waitStart,
+ TimestampTz cur_ts,
VirtualTransactionId *waitlist)

I think it's better to explain cur_ts as well in the function comment.

Regarding the function arguments, 'waitStart' is camel case whereas
'cur_ts' is snake case and 'waitlist' is using only lower cases. I
think we should unify them.

---
-ResolveRecoveryConflictWithLock(LOCKTAG locktag)
+ResolveRecoveryConflictWithLock(LOCKTAG locktag, bool logged_recovery_conflict)

The function argument name 'logged_recovery_conflict' sounds a bit
redundant to me as this function is used only for recovery conflict
resolution. How about 'need_log' or something? Also it’s better to
explain it in the function comment.

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tsunakawa.takay@fujitsu.com 2020-11-27 06:05:55 RE: POC: postgres_fdw insert batching
Previous Message Craig Ringer 2020-11-27 03:56:15 Re: POC: postgres_fdw insert batching