Re: CPU spikes and transactions

From: Tony Kay <tony(at)teamunify(dot)com>
To: Julien Cigar <jcigar(at)ulb(dot)ac(dot)be>
Cc: Tomas Vondra <tv(at)fuzzy(dot)cz>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: CPU spikes and transactions
Date: 2013-10-15 19:07:38
Message-ID: CAB=fRco-3pfakkZ=j0stBe8Ds22Bh9pp06nvgiqHM2ieeXVn=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Oct 15, 2013 at 10:26 AM, Julien Cigar <jcigar(at)ulb(dot)ac(dot)be> wrote:

>
> for two reasons:
>
> - some of the overhead of bgwriter and checkpoints is more or less linear
> in the size of shared_buffers, for example it could be possible that a
> large quantity of data could be dirty when a checkpoint occurs).
>
> - the OS cache is also being used for reads and writes, the larger
> shared_buffers is, the more you risk double buffering (same blocks
> in the OS cache and in the database buffer cache).
>

Excellent. Thank you for the information. My suspicion has always been that
the shared_buffers are "level 1 cache", so it seems to me that you'd want
that to be large enough to hold your entire database if you could. However,
I'm now realizing that I was _also_ assuming the IPC shared memory was also
being locked via mlock to prevent swapping it out, and I'm now getting the
feeling that this isn't true, which means the double buffering could lead
to swap space use on buffer cache pressure...which I do occasionally see in
ways I had not expected.

We do streaming replication and also store them for snapshot PITR backups,
so I am intimately familiar with our write load, and I can say it is pretty
low (we ship a 16MB WAL file about every 10-15 minutes during our busiest
times).

That said, I can see how an import that is doing a bunch of writes could
possibly spread those over a large area that would then consume a lot of
CPU on the writer and checkpoint; however, I do not see how either of those
would cause 40-60 different postgres backgroud processes (all running a
normally "light query") to spin off into oblivion unless the write work
load is somehow threaded into the background workers (which I'm sure it
isn't). So, I think we're still dealing with a spinlock issue.

We're going to upgrade to 9.1.10 (with debug symbols) Thursday night and
add another 64GB of RAM. I'll tune shared_buffers down to 2GB at that time
and bump effective_cache_size up at the same time. My large history of sar
data will make it apparent pretty quickly if that is a win/lose/tie.

If we have another spike in production, we'll be ready to measure it more
accurately.

Thanks,

Tony

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Julien Cigar 2013-10-15 19:41:06 Re: CPU spikes and transactions
Previous Message Julien Cigar 2013-10-15 17:26:23 Re: CPU spikes and transactions