From: | Vick Khera <vivek(at)khera(dot)org> |
---|---|
To: | Postgres General Postgres General <pgsql-general(at)postgresql(dot)org> |
Subject: | checkpoints/bgwriter tuning verification |
Date: | 2009-10-29 18:46:13 |
Message-ID: | 2968dfd60910291146o6dfa5992i3582ef9a48234be7@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On my primary DB I'm observing random slowness which just doesn't make
sense to me. The I/O system can easily do 40MB/sec writes, but I'm
only seeing a sustained 5MB/sec, even as the application is stalling
waiting on the DB.
My only guess is that I'm getting hit by checkpoints too often, and
this is causing some delays. I did a lot of reading and came across
this most excelelent article by Greg Smith
(http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm)
Reading that and the Pg manual and looking at the statistics, here is
what I think I need to do. (Greg, do you have a performance tuning
book? If so, I really want to buy it! Your articles are awesome.)
current settings for checkpoints:
checkpoint_segments = 32
checkpoint_timeout = 900
checkpoint_completion_target = 0.9
log_checkpoints = on
In my logs I see this pretty consistently during times of slowness:
Oct 29 14:17:38 d04 postgres[54902]: [10990-1] LOG: checkpoint starting: xlog
Oct 29 14:22:04 d04 postgres[54902]: [10991-1] LOG: checkpoint
complete: wrote 52828 buffers (24.3%); 0 transaction log file(s)
added, 0 removed, 32 recycled; write=265.881
Oct 29 14:22:57 d04 postgres[54902]: [10992-1] LOG: checkpoint starting: xlog
Oct 29 14:26:56 d04 postgres[54902]: [10993-1] LOG: checkpoint
complete: wrote 52773 buffers (24.3%); 0 transaction log file(s)
added, 0 removed, 32 recycled; write=234.846
Oct 29 14:28:32 d04 postgres[54902]: [10994-1] LOG: checkpoint starting: xlog
Oct 29 14:33:32 d04 postgres[54902]: [10995-1] LOG: checkpoint
complete: wrote 53807 buffers (24.7%); 0 transaction log file(s)
added, 0 removed, 32 recycled; write=299.170
basically, the next checkpoint starts within a few seconds of the
prior one completing.
The stats show this:
# select * from pg_stat_bgwriter;
-[ RECORD 1 ]------+-----------
checkpoints_timed | 3236
checkpoints_req | 83044
buffers_checkpoint | 1376460896
buffers_clean | 59124159
maxwritten_clean | 304410
buffers_backend | 285595787
buffers_alloc | 6643047623
Based on Greg's article and the above number showing that most
checkpoints are triggered by running out of WAL segments, I should
increase my checkpoint_buffers. Also, based on the article, I should
increase the bgwriter_lru_maxpages (currently at default 100).
My plan is to bump checkpoint_segments to 48 and reduce
checkpoint_completion_target to 0.7, and bump the
bgwriter_lru_maxpages to 500.
Can the checkpoint operation actually cause the DB to stop responding
for a few seconds at a time? That seems to be what I observe.
Sometimes for 5 or more seconds one transaction will just stall.
Thanks for any ideas.
From | Date | Subject | |
---|---|---|---|
Next Message | Brad Nicholson | 2009-10-29 19:58:57 | Re: checkpoints/bgwriter tuning verification |
Previous Message | Richard Huxton | 2009-10-29 18:12:39 | Re: Can't connect |