BUGFIX: Dynamic bgworkers with BGW_NEVER_RESTART worker restarted on FatalError

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Subject: BUGFIX: Dynamic bgworkers with BGW_NEVER_RESTART worker restarted on FatalError
Date: 2014-05-16 05:38:34
Message-ID: 5375A45A.20700@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all

There's a bug in the dynamic bgworkers code that I think needs fixing
before release. TL;DR: BGW_NO_RESTART workers are restarted after
postmaster crash, attached patch changes that.

The case that's triggering the issue is where a static bgworker is
registering a new dynamic bgworker to do some setup work each time it
starts. The static worker is relaunched on postmaster restart and
registers a new BGW_NO_RESTART dynamic bgworker. This dynamic bgworker
immediately hits an Assert and dies.

This *should* cause a postmaster restart loop; that's expected. What it
shouldn't do, but is doing, is restart multiple copies of the bgworker -
fail to purge the old BGW_NO_RESTART one and launch it as if it'd never
crashed.

The attached patch fixes the problem.

Detail:

If you set a worker as BGW_NO_RESTART it isn't restarted if it ERRORs
out. That's fine.

With Petr's applied patch it no longer restarts on exit 0 (normal exit)
either.

There's a third case, though: a bgworker crash causing postmaster
restart. In this case the bgworker is still restarted, which makes no
sense at all if it isn't for the other two cases.

The existing code looks like it tries to protect against this - in
maybe_start_bgworker, the invocation of do_start_bgworker is protected
by a prior test for rw->rw_crashed_at that, if
rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART, unregisters the
worker and skips to the next one.

However, in my testing a breakpoint inside the if (rw->rw_crashed_at !=
0) test in maybe_start_bgworker is never hit, even for a bgworker that
is known to have crashed. rw->rw_crashed_at is always zero.

The culprit is ResetBackgroundWorkerCrashTimes, which unconditionally
resets the crash time without considering that the worker might be
BGW_NO_RESTART.

The attached patch makes ResetBackgroundWorkerCrashTimes only reset the
crashed time for workers with a restart time set.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Don-t-restart-BGW_NO_RESTART-workers-after-postmaste.patch text/x-patch 1.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2014-05-16 05:55:50 Re: Trigger concurrent execution
Previous Message Fabrízio de Royes Mello 2014-05-16 05:03:29 psql \db+ lack of size column