Re: (Relatively) Oversized Checkpoint

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Nathaniel Talbott <nathaniel(at)spreedly(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: (Relatively) Oversized Checkpoint
Date: 2014-06-16 18:49:42
Message-ID: CAMkU=1zjniC4XL4fscjDqA9uTZ8OPQu=cSKBB4WK3-8Cx+raFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Jun 16, 2014 at 11:01 AM, Nathaniel Talbott
<nathaniel(at)spreedly(dot)com> wrote:
> We're still learning to tune PostgreSQL for our production setup, and over
> the weekend we saw a huge spike in INSERT/UPDATE completion time for a short
> period which turned out to correlate to an oversized checkpoint. I've
> included the preceding and following checkpoints for reference points, as
> they're typical of the usual checkpoints we see:
>
> Jun 16 06:01:19 sql-2 postgres[19015]: [10706-1] 2014-06-16 06:01:19 UTC
> LOG: checkpoint starting: time
> Jun 16 06:04:40 sql-2 postgres[19015]: [10707-1] 2014-06-16 06:04:40 UTC
> LOG: checkpoint complete: wrote 1962 buffers (0.7%); 0 transaction log
> file(s) added, 0 removed, 1 recycled; write=197.906 s, sync=2.411 s,
> total=200.697 s; sync files=42, longest=1.110 s, average=0.057 s
>
> Jun 16 06:06:19 sql-2 postgres[19015]: [10708-1] 2014-06-16 06:06:19 UTC
> LOG: checkpoint starting: time
> Jun 16 06:12:03 sql-2 postgres[19015]: [10709-1] 2014-06-16 06:12:03 UTC
> LOG: checkpoint complete: wrote 25958 buffers (9.9%); 0 transaction log
> file(s) added, 0 removed, 12 recycled; write=269.832 s, sync=74.105 s,
> total=344.294 s; sync files=49, longest=64.421 s, average=1.512 s

So the fsync (by the kernel) of a single file took 64 seconds. This
single event explains almost all of the overrun and the performance
problems.

This is a classic problem, but the solutions are fuzzy.

Lowering /proc/sys/vm/dirty_background_ratio and
/proc/sys/vm/dirty_background_bytes and not using ext3 for the file
system are the clearest I know of, but people have also reported that
lowering shared_buffers can help.

This is fundamentally a kernel/FS/IO issue. What are the current
settings for those, and what kernel and file system, and IO subsystem
are you using?

Cheers,

Jeff

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Rich Shepard 2014-06-16 20:51:13 Creating Table Copy
Previous Message Severn, Chris 2014-06-16 18:16:58