From: | Brad Nicholson <bnichols(at)ca(dot)afilias(dot)info> |
---|---|
To: | Vick Khera <vivek(at)khera(dot)org> |
Cc: | Postgres General Postgres General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: checkpoints/bgwriter tuning verification |
Date: | 2009-10-29 19:58:57 |
Message-ID: | 1256846337.11161.1417.camel@bnicholson-desktop |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Thu, 2009-10-29 at 14:46 -0400, Vick Khera wrote:
> On my primary DB I'm observing random slowness which just doesn't make
> sense to me. The I/O system can easily do 40MB/sec writes, but I'm
> only seeing a sustained 5MB/sec, even as the application is stalling
> waiting on the DB.
>
> My only guess is that I'm getting hit by checkpoints too often, and
> this is causing some delays. I did a lot of reading and came across
> this most excelelent article by Greg Smith
> (http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm)
> Reading that and the Pg manual and looking at the statistics, here is
> what I think I need to do. (Greg, do you have a performance tuning
> book? If so, I really want to buy it! Your articles are awesome.)
>
> current settings for checkpoints:
> checkpoint_segments = 32
> checkpoint_timeout = 900
> checkpoint_completion_target = 0.9
> log_checkpoints = on
>
> In my logs I see this pretty consistently during times of slowness:
>
> Oct 29 14:17:38 d04 postgres[54902]: [10990-1] LOG: checkpoint starting: xlog
> Oct 29 14:22:04 d04 postgres[54902]: [10991-1] LOG: checkpoint
> complete: wrote 52828 buffers (24.3%); 0 transaction log file(s)
> added, 0 removed, 32 recycled; write=265.881
> Oct 29 14:22:57 d04 postgres[54902]: [10992-1] LOG: checkpoint starting: xlog
> Oct 29 14:26:56 d04 postgres[54902]: [10993-1] LOG: checkpoint
> complete: wrote 52773 buffers (24.3%); 0 transaction log file(s)
> added, 0 removed, 32 recycled; write=234.846
> Oct 29 14:28:32 d04 postgres[54902]: [10994-1] LOG: checkpoint starting: xlog
> Oct 29 14:33:32 d04 postgres[54902]: [10995-1] LOG: checkpoint
> complete: wrote 53807 buffers (24.7%); 0 transaction log file(s)
> added, 0 removed, 32 recycled; write=299.170
>
>
> basically, the next checkpoint starts within a few seconds of the
> prior one completing.
That's most likely due to having checkpoint_completion target set near
the maximum. You are telling it to stretch the checkpoint out as long
as possible.
>
> The stats show this:
>
> # select * from pg_stat_bgwriter;
> -[ RECORD 1 ]------+-----------
> checkpoints_timed | 3236
> checkpoints_req | 83044
> buffers_checkpoint | 1376460896
> buffers_clean | 59124159
> maxwritten_clean | 304410
> buffers_backend | 285595787
> buffers_alloc | 6643047623
>
> Based on Greg's article and the above number showing that most
> checkpoints are triggered by running out of WAL segments, I should
> increase my checkpoint_buffers. Also, based on the article, I should
> increase the bgwriter_lru_maxpages (currently at default 100).
>
> My plan is to bump checkpoint_segments to 48 and reduce
> checkpoint_completion_target to 0.7,
That's what I would do. I think you getting hit by checkpointing too
frequently or stretching the checkpoints out too long (or a combo of
both).
I'd also keep bumping checkpoint_segements up until you are
checkpointing on the timeout. This will give you a longer window
between checkpoints - which gives more time to smooth the the checkpoint
out.
> and bump the
> bgwriter_lru_maxpages to 500.
I would probably start by increasing the number of WAL segments and
decreasing the checkpoint_completion_timeout and see if that solves the
problem. I wouldn't change the bgwriter_lru_maxpages at the same time,
and wouldn't necessarily increase it by a factor of 5 when doing so.
It does look like you need to increase it though.
> Can the checkpoint operation actually cause the DB to stop responding
> for a few seconds at a time? That seems to be what I observe.
> Sometimes for 5 or more seconds one transaction will just stall.
Absolutely. The smoothing is supposed to alleviate that, but I'm not
sure how well it works with aggressive activity driven checkpoints like
you have. Keep up posted, I'm curious.
> Thanks for any ideas.
>
--
Brad Nicholson 416-673-4106
Database Administrator, Afilias Canada Corp.
From | Date | Subject | |
---|---|---|---|
Next Message | Blake Starkenburg | 2009-10-29 20:34:10 | Column Type Suggestions |
Previous Message | Vick Khera | 2009-10-29 18:46:13 | checkpoints/bgwriter tuning verification |