Re: when the startup process doesn't (logging startup delays)

From: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Stephen Frost <sfrost(at)snowman(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: when the startup process doesn't (logging startup delays)
Date: 2021-10-29 11:38:52
Message-ID: CAMm1aWZi5FV=m2hvkTj5bjQ03DYece3HMJGngdXbRiSuaY2CWw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> I was fooling around with a test setup today, working on an unrelated
> problem, and this happened:
>
> 2021-10-28 14:21:23.145 EDT [92010] LOG: resetting unlogged relations
> (init), elapsed time: 0.00 s, current path: base/13020

Nice catch and interesting case.

> That's not supposed to happen. I assume the problem is that the
> timeout for the previous phase fired just as we were beginning a new
> one, and the code got confused. I think we probably need to do
> something like this to make sure that the timeout from one operation
> can't trigger a log message for the next:
>
> diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
> index 28e68dd871..47ec737888 100644
> --- a/src/backend/postmaster/startup.c
> +++ b/src/backend/postmaster/startup.c
> @@ -320,6 +320,8 @@ begin_startup_progress_phase(void)
> if (log_startup_progress_interval == 0)
> return;
>
> + disable_timeout(STARTUP_PROGRESS_TIMEOUT, false);
> + startup_progress_timer_expired = false;
> startup_progress_phase_start_time = GetCurrentTimestamp();
> fin_time = TimestampTzPlusMilliseconds(startup_progress_phase_start_time,
> log_startup_progress_interval);
>
> Thoughts?

Yes. I agree that this is not an expected behaviour and I do agree
that, probably the timeout for the previous phase fired just as we
were beginning a new one. For each operation, we call
begin_startup_progress_phase() before starting the operation and
one/multiple calls to ereport_startup_progress() to report the
progress and intentionally we have not added any functionality to
disable the timer at the end of the operation. The timer remains
active and there may be multiple calls to
startup_progress_timeout_handler() which sets a flag to true. So
whenever we call a begin_startup_progress_phase() for the next
operation, we do enable the timer (In my understanding it will
internally disable the old timer and schedule a new one) but the flag
is already set to true by the previous timer. Hence the next call to
ereport_startup_progress() logs a message. So I feel just setting
'startup_progress_timer_expired' to false in
begin_startup_progress_phase() would work. Please correct me if I am
wrong.

Thanks & Regards,
Nitin Jadhav

On Thu, Oct 28, 2021 at 11:59 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Mon, Oct 25, 2021 at 11:56 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > This version looks fine, so I have committed it (and my
> > enable_timeout_every patch also, as a necessary prerequisite).
>
> I was fooling around with a test setup today, working on an unrelated
> problem, and this happened:
>
> 2021-10-28 14:21:23.145 EDT [92010] LOG: resetting unlogged relations
> (init), elapsed time: 0.00 s, current path: base/13020
>
> That's not supposed to happen. I assume the problem is that the
> timeout for the previous phase fired just as we were beginning a new
> one, and the code got confused. I think we probably need to do
> something like this to make sure that the timeout from one operation
> can't trigger a log message for the next:
>
> diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
> index 28e68dd871..47ec737888 100644
> --- a/src/backend/postmaster/startup.c
> +++ b/src/backend/postmaster/startup.c
> @@ -320,6 +320,8 @@ begin_startup_progress_phase(void)
> if (log_startup_progress_interval == 0)
> return;
>
> + disable_timeout(STARTUP_PROGRESS_TIMEOUT, false);
> + startup_progress_timer_expired = false;
> startup_progress_phase_start_time = GetCurrentTimestamp();
> fin_time = TimestampTzPlusMilliseconds(startup_progress_phase_start_time,
> log_startup_progress_interval);
>
> Thoughts?
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2021-10-29 11:54:29 Re: Support for NSS as a libpq TLS backend
Previous Message Michael Paquier 2021-10-29 11:38:33 Re: Teach pg_receivewal to use lz4 compression