From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Noah Misch <noah(at)leadboat(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Tracing down buildfarm "postmaster does not shut down" failures |
Date: | 2016-02-10 04:21:40 |
Message-ID: | 56BABAD4.7040006@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 02/09/2016 10:27 PM, Tom Lane wrote:
> Noah Misch <noah(at)leadboat(dot)com> writes:
>> On Tue, Feb 09, 2016 at 10:02:17PM -0500, Tom Lane wrote:
>>> I wonder if it's worth sticking some instrumentation into stats
>>> collector shutdown?
>> I wouldn't be surprised if the collector got backlogged during the main phase
>> of testing and took awhile to chew through its message queue before even
>> starting the write of the final stats.
> But why would the ecpg tests show such an effect when the main regression
> tests don't? AFAIK the ecpg tests don't exactly stress the server ---
> note the trivial amount of data written by the shutdown checkpoint,
> for instance.
The main regression tests run with the stats file on the ramdisk.
>
> The other weird thing is that it's only sometimes slow. If you look at
> the last buildfarm result from axolotl, for instance, the tail end of
> the ecpg log is
>
> LOG: ShutdownSUBTRANS() complete at 2016-02-09 16:31:14.784 EST
> LOG: database system is shut down at 2016-02-09 16:31:14.784 EST
> LOG: lock files all released at 2016-02-09 16:31:14.817 EST
>
> so we only spent ~50ms on stats write that time.
That part is puzzling.
> The idea I was toying with is that previous filesystem activity (making
> the temp install, the server's never-fsync'd writes, etc) has built up a
> bunch of dirty kernel buffers, and at some point the kernel goes nuts
> writing all that data. So the issues we're seeing would come and go
> depending on the timing of that I/O spike. I'm not sure how to prove
> such a theory from here.
Yeah. It's faintly possible that a kernel upgrade will help.
cheers
andrew
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2016-02-10 04:53:35 | Re: GinPageIs* don't actually return a boolean |
Previous Message | Amit Kapila | 2016-02-10 04:14:53 | Re: Speed up Clog Access by increasing CLOG buffers |