From: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
---|---|
To: | Kevin Grittner <kgrittn(at)ymail(dot)com> |
Cc: | Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers |
Date: | 2013-09-13 21:10:04 |
Message-ID: | CAHyXU0yvKy2jgqPWO1ZdYSdVuH6YS_H==RDZ0jxuGCEOk-q1cw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Sep 13, 2013 at 4:04 PM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>
>> Absolutely not claiming the contrary. I think it sucks that we
>> couldn't fully figure out what's happening in detail. I'd love to
>> get my hand on a setup where it can be reliably reproduced.
>
> I have seen two completely different causes for symptoms like this,
> and I suspect that these aren't the only two.
>
> (1) The dirty page avalanche: PostgreSQL hangs on to a large
> number of dirty buffers and then dumps a lot of them at once. The
> OS does the same. When PostgreSQL dumps its buffers to the OS it
> pushes the OS over a "tipping point" where it is writing dirty
> buffers too fast for the controller's BBU cache to absorb them.
> Everything freezes until the controller writes and accepts OS
> writes for a lot of data. This can take several minutes, during
> which time the database seems "frozen". Cure is some combination
> of these: reduce shared_buffers, make the background writer more
> aggressive, checkpoint more often, make the OS dirty page writing
> more aggressive, add more BBU RAM to the controller.
Yeah -- I've seen this too, and it's a well understood problem.
Getting o/s to spin dirty pages out faster is the name of the game I
think. Storage is getting so fast that it's (mostly) moot anyways.
Also, this is under the umbrella of 'high i/o' -- the stuff I've been
seeing is low- or no- I/o.
> (2) Transparent huge page support goes haywire on its defrag work.
> Clues on this include very high "system" CPU time during an
> episode, and `perf top` shows more time in kernel spinlock
> functions than anywhere else. The database doesn't completely lock
> up like with the dirty page avalanche, but it is slow enough that
> users often describe it that way. So far I have only seen this
> cured by disabling THP support (in spite of some people urging that
> just the defrag be disabled). It does make me wonder whether there
> is something we could do in PostgreSQL to interact better with
> THPs.
Ah, that's a useful tip; need to research that, thanks. Maybe Josh
might be able to give it a whirl...
merlin
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Berkus | 2013-09-13 21:15:53 | Re: Large shared_buffer stalls WAS: proposal: Set effective_cache_size to greater of .conf value, shared_buffers |
Previous Message | Kevin Grittner | 2013-09-13 21:04:55 | Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers |