Quick Links

checkpoints/bgwriter tuning verification

From:	Vick Khera <vivek(at)khera(dot)org>
To:	Postgres General Postgres General <pgsql-general(at)postgresql(dot)org>
Subject:	checkpoints/bgwriter tuning verification
Date:	2009-10-29 18:46:13
Message-ID:	2968dfd60910291146o6dfa5992i3582ef9a48234be7@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On my primary DB I'm observing random slowness which just doesn't make
sense to me. The I/O system can easily do 40MB/sec writes, but I'm
only seeing a sustained 5MB/sec, even as the application is stalling
waiting on the DB.

My only guess is that I'm getting hit by checkpoints too often, and
this is causing some delays. I did a lot of reading and came across
this most excelelent article by Greg Smith
(http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm)
Reading that and the Pg manual and looking at the statistics, here is
what I think I need to do. (Greg, do you have a performance tuning
book? If so, I really want to buy it! Your articles are awesome.)

current settings for checkpoints:
checkpoint_segments = 32
checkpoint_timeout = 900
checkpoint_completion_target = 0.9
log_checkpoints = on

In my logs I see this pretty consistently during times of slowness:

Oct 29 14:17:38 d04 postgres[54902]: [10990-1] LOG: checkpoint starting: xlog
Oct 29 14:22:04 d04 postgres[54902]: [10991-1] LOG: checkpoint
complete: wrote 52828 buffers (24.3%); 0 transaction log file(s)
added, 0 removed, 32 recycled; write=265.881
Oct 29 14:22:57 d04 postgres[54902]: [10992-1] LOG: checkpoint starting: xlog
Oct 29 14:26:56 d04 postgres[54902]: [10993-1] LOG: checkpoint
complete: wrote 52773 buffers (24.3%); 0 transaction log file(s)
added, 0 removed, 32 recycled; write=234.846
Oct 29 14:28:32 d04 postgres[54902]: [10994-1] LOG: checkpoint starting: xlog
Oct 29 14:33:32 d04 postgres[54902]: [10995-1] LOG: checkpoint
complete: wrote 53807 buffers (24.7%); 0 transaction log file(s)
added, 0 removed, 32 recycled; write=299.170

basically, the next checkpoint starts within a few seconds of the
prior one completing.

The stats show this:

Based on Greg's article and the above number showing that most
checkpoints are triggered by running out of WAL segments, I should
increase my checkpoint_buffers. Also, based on the article, I should
increase the bgwriter_lru_maxpages (currently at default 100).

My plan is to bump checkpoint_segments to 48 and reduce
checkpoint_completion_target to 0.7, and bump the
bgwriter_lru_maxpages to 500.

Can the checkpoint operation actually cause the DB to stop responding
for a few seconds at a time? That seems to be what I observe.
Sometimes for 5 or more seconds one transaction will just stall.

Thanks for any ideas.

Responses

Re: checkpoints/bgwriter tuning verification at 2009-10-29 19:58:57 from Brad Nicholson
Re: checkpoints/bgwriter tuning verification at 2009-10-29 20:42:25 from Scott Marlowe
Re: checkpoints/bgwriter tuning verification at 2009-10-30 02:24:05 from Greg Smith

Browse pgsql-general by date

	From	Date	Subject
Next Message	Brad Nicholson	2009-10-29 19:58:57	Re: checkpoints/bgwriter tuning verification
Previous Message	Richard Huxton	2009-10-29 18:12:39	Re: Can't connect