From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Preventing hangups in bgworker start/stop during DB shutdown |
Date: | 2020-12-24 12:49:51 |
Message-ID: | CALj2ACWWiqC1Ci3dr4nTbpxH78z=fGO14uk6NEaT14zpbutKnA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Dec 23, 2020 at 3:10 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Here's an attempt at closing the race condition discussed in [1]
> (and in some earlier threads, though I'm too lazy to find them).
>
> The core problem is that the bgworker management APIs were designed
> without any thought for exception conditions, notably "we're not
> gonna launch any more workers because we're shutting down the database".
> A process waiting for a worker in WaitForBackgroundWorkerStartup or
> WaitForBackgroundWorkerShutdown will wait forever, so that the database
> fails to shut down without manual intervention.
>
> I'd supposed that we would need some incompatible changes in those APIs
> in order to fix this, but after further study it seems like we could
> hack things by just acting as though a request that won't be serviced
> has already run to completion. I'm not terribly satisfied with that
> as a long-term solution --- it seems to me that callers should be told
> that there was a failure. But this looks to be enough to solve the
> lockup condition for existing callers, and it seems like it'd be okay
> to backpatch.
>
> Thoughts?
>
> [1] https://www.postgresql.org/message-id/flat/16785-c0207d8c67fb5f25%40postgresql.org
1) Yeah, the postmaster will not be able to start the bg workers in
following cases, when bgworker_should_start_now returns false. So we
might encounter the hang issue.
static bool
bgworker_should_start_now(BgWorkerStartTime start_time)
{
switch (pmState)
{
case PM_NO_CHILDREN:
case PM_WAIT_DEAD_END:
case PM_SHUTDOWN_2:
case PM_SHUTDOWN:
case PM_WAIT_BACKENDS:
case PM_STOP_BACKENDS:
break;
2) What if postmaster enters pmState >= PM_STOP_BACKENDS state after
it calls BackgroundWorkerStateChange(pmState < PM_STOP_BACKENDS)?
First of all, is it possible? I think yes, because we are in
sigusr1_handler(), and I don't see we blocking the signal that sets
pmState >= PM_STOP_BACKENDS either in sigusr1_handler or in
BackgroundWorkerStateChange(). Though it's a small window, we might
get into the hangup issue? If yes, can we check the pmState in the for
loop in BackgroundWorkerStateChange()?
if (CheckPostmasterSignal(PMSIGNAL_BACKGROUND_WORKER_CHANGE))
{
- BackgroundWorkerStateChange();
+ /* Accept new dynamic worker requests only if not stopping. */
+ BackgroundWorkerStateChange(pmState < PM_STOP_BACKENDS);
StartWorkerNeeded = true;
}
3) Can we always say that if bgw_restart_time is BGW_NEVER_RESTART,
then it's a dynamic bg worker? I think we can also have normal
bgworkers with BGW_NEVER_RESTART flag(I see one in worker_spi.c
_PG_init()), will there be any problem? Or do we need some comment
tweaking?
+ /*
+ * If this is a dynamic worker request, and we aren't allowing new
+ * dynamic workers, then immediately mark it for termination; the next
+ * stanza will take care of cleaning it up.
+ */
+ if (slot->worker.bgw_restart_time == BGW_NEVER_RESTART &&
+ !allow_new_workers)
+ slot->terminate = true;
4) IIUC, in the patch we mark slot->terminate = true only for
BGW_NEVER_RESTART kind bg workers, what happens if a bg worker has
bgw_restart_time seconds and don't we hit the hanging issue(that we
are trying to solve here) for those bg workers?
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | k.jamison@fujitsu.com | 2020-12-24 13:29:53 | RE: [Patch] Optimize dropping of relation buffers using dlist |
Previous Message | Tomas Vondra | 2020-12-24 12:38:03 | Re: How is this possible "publication does not exist" |