From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
Cc: | Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: Tracing down buildfarm "postmaster does not shut down" failures |
Date: | 2016-02-09 22:53:59 |
Message-ID: | 24722.1455058439@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I wrote:
> I'm not sure whether there's anything to be gained by leaving the tracing
> code in there till we see actual buildfarm fails. There might be another
> slowdown mechanism somewhere, but I rather doubt it. Thoughts?
Hmmm ... I take that back. AFAICT, the failures on Noah's AIX zoo are
sufficiently explained by the "mdpostckpt takes a long time after the
regression tests" theory. However, there is something else happening
on axolotl. Looking at the HEAD and 9.5 branches, there are three very
similar failures in the ECPG step within the past 60 days:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=axolotl&dt=2016-02-08%2014%3A49%3A23
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=axolotl&dt=2015-12-15%2018%3A49%3A31
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=axolotl&dt=2015-12-12%2001%3A44%3A39
In all three, we got "pg_ctl: server does not shut down", but the
postmaster log claims that it shut down, and pretty speedily too.
For example, in the 2015-12-12 failure,
LOG: received fast shutdown request
LOG: aborting any active transactions
LOG: autovacuum launcher shutting down
LOG: shutting down
LOG: checkpoint starting: shutdown immediate
LOG: checkpoint complete: wrote 176 buffers (1.1%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.039 s, sync=0.000 s, total=0.059 s; sync files=0, longest=0.000 s, average=0.000 s; distance=978 kB, estimate=978 kB
LOG: database system is shut down
We have no theory that would account for postmaster shutdown stalling
after the end of ShutdownXLOG, but that seems to be what happened.
How come? Why does only the ECPG test seem to be affected?
It's also pretty fishy that we have three failures in 60 days on HEAD+9.5
but none before that, and none in the older branches. That smells like
a recently-introduced bug, though I have no idea what.
Andrew, I wonder if I could prevail on you to make axolotl run "make
check" on HEAD in src/interfaces/ecpg/ until it fails, so that we can
see if the logging I added tells anything useful about this.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2016-02-09 23:30:02 | Re: Tracing down buildfarm "postmaster does not shut down" failures |
Previous Message | Jim Nasby | 2016-02-09 22:41:58 | Re: proposal: schema PL session variables |