checkpoints/bgwriter tuning verification

From: Vick Khera <vivek(at)khera(dot)org>
To: Postgres General Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: checkpoints/bgwriter tuning verification
Date: 2009-10-29 18:46:13
Message-ID: 2968dfd60910291146o6dfa5992i3582ef9a48234be7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On my primary DB I'm observing random slowness which just doesn't make
sense to me. The I/O system can easily do 40MB/sec writes, but I'm
only seeing a sustained 5MB/sec, even as the application is stalling
waiting on the DB.

My only guess is that I'm getting hit by checkpoints too often, and
this is causing some delays. I did a lot of reading and came across
this most excelelent article by Greg Smith
(http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm)
Reading that and the Pg manual and looking at the statistics, here is
what I think I need to do. (Greg, do you have a performance tuning
book? If so, I really want to buy it! Your articles are awesome.)

current settings for checkpoints:
checkpoint_segments = 32
checkpoint_timeout = 900
checkpoint_completion_target = 0.9
log_checkpoints = on

In my logs I see this pretty consistently during times of slowness:

Oct 29 14:17:38 d04 postgres[54902]: [10990-1] LOG: checkpoint starting: xlog
Oct 29 14:22:04 d04 postgres[54902]: [10991-1] LOG: checkpoint
complete: wrote 52828 buffers (24.3%); 0 transaction log file(s)
added, 0 removed, 32 recycled; write=265.881
Oct 29 14:22:57 d04 postgres[54902]: [10992-1] LOG: checkpoint starting: xlog
Oct 29 14:26:56 d04 postgres[54902]: [10993-1] LOG: checkpoint
complete: wrote 52773 buffers (24.3%); 0 transaction log file(s)
added, 0 removed, 32 recycled; write=234.846
Oct 29 14:28:32 d04 postgres[54902]: [10994-1] LOG: checkpoint starting: xlog
Oct 29 14:33:32 d04 postgres[54902]: [10995-1] LOG: checkpoint
complete: wrote 53807 buffers (24.7%); 0 transaction log file(s)
added, 0 removed, 32 recycled; write=299.170

basically, the next checkpoint starts within a few seconds of the
prior one completing.

The stats show this:

# select * from pg_stat_bgwriter;
-[ RECORD 1 ]------+-----------
checkpoints_timed | 3236
checkpoints_req | 83044
buffers_checkpoint | 1376460896
buffers_clean | 59124159
maxwritten_clean | 304410
buffers_backend | 285595787
buffers_alloc | 6643047623

Based on Greg's article and the above number showing that most
checkpoints are triggered by running out of WAL segments, I should
increase my checkpoint_buffers. Also, based on the article, I should
increase the bgwriter_lru_maxpages (currently at default 100).

My plan is to bump checkpoint_segments to 48 and reduce
checkpoint_completion_target to 0.7, and bump the
bgwriter_lru_maxpages to 500.

Can the checkpoint operation actually cause the DB to stop responding
for a few seconds at a time? That seems to be what I observe.
Sometimes for 5 or more seconds one transaction will just stall.

Thanks for any ideas.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Brad Nicholson 2009-10-29 19:58:57 Re: checkpoints/bgwriter tuning verification
Previous Message Richard Huxton 2009-10-29 18:12:39 Re: Can't connect