On Wed, Dec 31, 2014 at 01:56:08PM -0500, Noah Misch wrote:
> On Wed, Dec 31, 2014 at 12:32:37AM -0500, Robert Haas wrote:
> > On Sun, Dec 28, 2014 at 4:58 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > > I wondered whether to downgrade FATAL to LOG in back branches. Introducing a
> > > new reason to block startup is disruptive for a minor release, but having the
> > > postmaster deadlock at an unpredictable later time is even more disruptive. I
> > > am inclined to halt startup that way in all branches.
> >
> > Jeepers. I'd rather not do that. From your report, this problem has
> > been around for years. Yet, as far as I know, it's bothering very few
> > real users, some of whom might be far more bothered by the postmaster
> > suddenly failing to start. I'm fine with a FATAL in master, but I'd
> > vote against doing anything that might prevent startup in the
> > back-branches without more compelling justification.
>
> Clusters hosted on OS X fall into these categories:
>
> 1) Unaffected configuration. This includes everyone setting a valid messages
> locale via LANG, LC_ALL or LC_MESSAGES.
> 2) Affected configuration. Through luck and light use, the cluster would not
> experience the crashes/hangs.
> 3) Cluster would experience the crashes/hangs.
>
> DBAs in (3) want the FATAL at startup, but those in (2) want a LOG message
> instead. DBAs in (1) don't care. Since intermittent postmaster hangs are far
> worse than startup failure, if (2) and (3) have similar population, FATAL is
> the better bet. If (2) is sufficiently more populous than (3), then the many
> small pricks from startup failure do add up to hurt more than the occasional
> postmaster hang. Who knows how that calculation plays out.
The first attached patch, for all branches, adds LOG-level messages and an
assertion. So cassert builds will fail hard, while others won't. The second
patch, for master only, changes the startup-time message to FATAL. If we
decide to use FATAL in all branches, I would just squash them into one.