Re: BUGFIX: Dynamic bgworkers with BGW_NEVER_RESTART worker restarted on FatalError

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Subject: Re: BUGFIX: Dynamic bgworkers with BGW_NEVER_RESTART worker restarted on FatalError
Date: 2014-05-21 12:01:28
Message-ID: CA+TgmobDQe0wstbs-jAgxQtKOhotvqu+LZxmJGNDjN78M_LYvA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 16, 2014 at 1:38 AM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> There's a bug in the dynamic bgworkers code that I think needs fixing
> before release. TL;DR: BGW_NO_RESTART workers are restarted after
> postmaster crash, attached patch changes that.
>
> The case that's triggering the issue is where a static bgworker is
> registering a new dynamic bgworker to do some setup work each time it
> starts. The static worker is relaunched on postmaster restart and
> registers a new BGW_NO_RESTART dynamic bgworker. This dynamic bgworker
> immediately hits an Assert and dies.
>
> This *should* cause a postmaster restart loop; that's expected. What it
> shouldn't do, but is doing, is restart multiple copies of the bgworker -
> fail to purge the old BGW_NO_RESTART one and launch it as if it'd never
> crashed.
>
> The attached patch fixes the problem.

Hmm, I think Petr's original patch might have even done something like
this. It did have some sort of conditional around setting
rw_crashed_at; I can't remember if it's the same as what you've got
here. But I changed it, because I said to myself something along the
lines: "Surely it can't hurt to reset rw_crashed_at unconditionally."
Evidently, it can.

But let's think about this scenario. Suppose some backend (it doesn't
matter if it is a user session or another background worker) launches
a run-once (i.e. BGW_NEVER_RESTART) background worker. Before the
postmaster actually starts that worker, it seg faults or fails an
assertion, and the postmaster does a crash-and-restart. After
resetting, that background worker will still have rw_crashed_at = 0,
and will get started. Is that what we want? I'd argue probably not.

If you agree, perhaps we should treat all background workers as having
been started at least once and then crashed, but a long time ago so
that if they're supposed to be restarted that happens right away. In
other words, have ResetBackgroundWorkerCrashTimes() unconditionally
fill in a non-zero but very old value (say, 1) rather than zero.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-05-21 14:49:43 Re: pg_upgrade fails: Mismatch of relation OID in database 8.4 -> 9.3
Previous Message Heikki Linnakangas 2014-05-21 11:53:54 Re: jsonb failed assertions