Quick Links

Re: Tracing down buildfarm "postmaster does not shut down" failures

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: Tracing down buildfarm "postmaster does not shut down" failures
Date:	2016-02-09 22:53:59
Message-ID:	24722.1455058439@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

I wrote:
> I'm not sure whether there's anything to be gained by leaving the tracing
> code in there till we see actual buildfarm fails. There might be another
> slowdown mechanism somewhere, but I rather doubt it. Thoughts?

Hmmm ... I take that back. AFAICT, the failures on Noah's AIX zoo are
sufficiently explained by the "mdpostckpt takes a long time after the
regression tests" theory. However, there is something else happening
on axolotl. Looking at the HEAD and 9.5 branches, there are three very
similar failures in the ECPG step within the past 60 days:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=axolotl&dt=2016-02-08%2014%3A49%3A23
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=axolotl&dt=2015-12-15%2018%3A49%3A31
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=axolotl&dt=2015-12-12%2001%3A44%3A39

In all three, we got "pg_ctl: server does not shut down", but the
postmaster log claims that it shut down, and pretty speedily too.
For example, in the 2015-12-12 failure,

LOG: received fast shutdown request
LOG: aborting any active transactions
LOG: autovacuum launcher shutting down
LOG: shutting down
LOG: checkpoint starting: shutdown immediate
LOG: checkpoint complete: wrote 176 buffers (1.1%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.039 s, sync=0.000 s, total=0.059 s; sync files=0, longest=0.000 s, average=0.000 s; distance=978 kB, estimate=978 kB
LOG: database system is shut down

We have no theory that would account for postmaster shutdown stalling
after the end of ShutdownXLOG, but that seems to be what happened.
How come? Why does only the ECPG test seem to be affected?

It's also pretty fishy that we have three failures in 60 days on HEAD+9.5
but none before that, and none in the older branches. That smells like
a recently-introduced bug, though I have no idea what.

Andrew, I wonder if I could prevail on you to make axolotl run "make
check" on HEAD in src/interfaces/ecpg/ until it fails, so that we can
see if the logging I added tells anything useful about this.

regards, tom lane

In response to

Re: Tracing down buildfarm "postmaster does not shut down" failures at 2016-02-09 19:10:50 from Tom Lane

Responses

Re: Tracing down buildfarm "postmaster does not shut down" failures at 2016-02-09 23:30:02 from Tom Lane
Re: Tracing down buildfarm "postmaster does not shut down" failures at 2016-02-09 23:46:53 from Andrew Dunstan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2016-02-09 23:30:02	Re: Tracing down buildfarm "postmaster does not shut down" failures
Previous Message	Jim Nasby	2016-02-09 22:41:58	Re: proposal: schema PL session variables