Re: Tracing down buildfarm "postmaster does not shut down" failures

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Tracing down buildfarm "postmaster does not shut down" failures
Date: 2016-02-09 03:55:24
Message-ID: 19177.1454990124@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> writes:
> On Mon, Feb 08, 2016 at 02:15:48PM -0500, Tom Lane wrote:
>> We've seen variants
>> on this theme on half a dozen machines just in the past week --- and it
>> seems to mostly happen in 9.5 and HEAD, which is fishy.

> It has been affecting only the four AIX animals, which do share hardware.
> (Back in 2015 and once in 2016-01, it did affect axolotl and shearwater.)

Certainly your AIX critters have shown this a bunch, but here's another
current example:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=axolotl&dt=2016-02-08%2014%3A49%3A23

> That's reasonable. If you would like higher-fidelity data, I can run loops of
> "pg_ctl -w start; make installcheck; pg_ctl -t900 -w stop", and I could run
> that for HEAD and 9.2 simultaneously. A day of logs from that should show
> clearly if HEAD is systematically worse than 9.2.

That sounds like a fine plan, please do it.

> So, I wish to raise the timeout for those animals. Using an environment
> variable was a good idea; it's one less thing for test authors to remember.
> Since the variable affects a performance-related fudge factor rather than
> change behavior per se, I'm less skittish than usual about unintended
> consequences of dynamic scope. (With said unintended consequences in mind, I
> made "pg_ctl register" ignore PGCTLTIMEOUT rather than embed its value into
> the service created.)

While this isn't necessarily a bad idea in isolation, the current
buildfarm scripts explicitly specify a -t value to pg_ctl stop, which
I would not expect an environment variable to override. So we need
to fix the buildfarm script to allow the timeout to be configurable.
I'm not sure if there would be any value-add in having pg_ctl answer
to an environment variable once we've done that.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2016-02-09 04:00:50 Re: WAL Re-Writes
Previous Message Amit Kapila 2016-02-09 03:42:55 Re: WAL Re-Writes