Re: Why has postmaster shutdown gotten so slow?

From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Why has postmaster shutdown gotten so slow?
Date: 2004-02-06 22:45:19
Message-ID: 402418FF.7090408@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
>> I checked the background writer for this and I can not reproduce the
>> behaviour. If the bgwriter had zero blocks to write it does PG_USLEEP
>> for 10 seconds, which on Unix is done by select() and that is correctly
>> interrupted when the postmaster sends it the term signal on shutdown.
>
> This appears to be a platform-dependent behavior. The HPUX select(2) man
> page says
>
> [EINTR] The select() function was interrupted before any
> of the selected events occurred and before the
> timeout interval expired. If SA_RESTART has been
> set for the interrupting signal, it is
> implementation-dependent whether select() restarts
> or returns with EINTR.
>
> which text also appears verbatim in the Single Unix Spec. Since we set
> SA_RESTART for every signal except SIGALRM (see pqsignal.c), we are
> subject to the implementation dependency for SIGTERM.

That explains it.

>
> Tracing the bgwriter process on my machine makes it real obvious that in
> fact the select delay is allowed to finish out when SIGTERM is received.
> In fact worse than that: it's restarted from the beginning. If 5
> seconds have already elapsed, another 10 still elapse before the select
> exits.
>
> This won't do :-(. We cannot afford to fritter away 10 seconds in the
> SIGTERM shutdown cycle --- on typical systems init isn't going to give
> us more than 20 seconds before a hard kill.
>
> I'd suggest reducing the delay to a second or two, or perhaps breaking
> it into several 1-second waits with interrupt flag checks between.
>
> In the longer run we might want to rethink what we are doing with
> SA_RESTART, but I am not sure about the implications of fooling with
> that.

I think we should at this point have some maximum value for PG_xSLEEP
over which it falls back to a function call that does either this
breaking up into a loop with checking InterruptPending or removes the
SA_RESTART flag while wating for the timeout.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Wade Klaver 2004-02-06 22:57:52 Make failed in HEAD with make -j
Previous Message Robert Treat 2004-02-06 22:32:04 Re: Preventing duplicate vacuums?