Re: Tracing down buildfarm "postmaster does not shut down" failures

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Tracing down buildfarm "postmaster does not shut down" failures
Date: 2016-02-08 20:41:23
Message-ID: 20160208204123.GA325488@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Of late, by far the majority of the random-noise failures we see in the
> buildfarm have come from failure to shut down the postmaster in a
> reasonable timeframe.

I noticed that.

> An example is this current failure on hornet:
>
> http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=hornet&dt=2016-02-08%2013%3A41%3A55

FWIW you seem to have edited this URL -- it returns a blank page.
The right one is
http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=hornet&dt=2016-02-08%2013%3A41%3A55&stg=stopdb-C-1

> What I'd like to do to investigate this is put in a temporary HEAD-only
> patch that makes ShutdownXLOG() and its subroutines much chattier about
> how far they've gotten and what time it is, and also makes pg_ctl print
> out the current time if it gives up waiting. A few failed runs with
> that in place will at least allow us to confirm or deny whether it's
> just that the shutdown checkpoint is sometimes really slow, or whether
> there's a bug lurking.
>
> Any objections? Anybody have another idea for data to collect?

Seems like a reasonable place to start. I assume you'll be installing
some debug-elog calls, enabled by default, and then once the problem is
fixed, we'll change the default to disabled but keep the actual calls?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2016-02-08 20:42:06 Re: [ADMIN] 9.5 new setting "cluster name" and logging
Previous Message Thomas Munro 2016-02-08 20:36:17 Re: [ADMIN] 9.5 new setting "cluster name" and logging