Quick Links

Re: CPU spikes and transactions

From:	Tony Kay <tony(at)teamunify(dot)com>
To:	Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: CPU spikes and transactions
Date:	2013-10-15 15:59:08
Message-ID:	CAB=fRcqLZWmEW8omtc8+xzarAVPGrhnQmY1Lst0Edw1F92_HvQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Mon, Oct 14, 2013 at 4:42 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:

> On 15.10.2013 01:00, Tony Kay wrote:
> > Hi,
> >
> > I'm running 9.1.6 w/22GB shared buffers, and 32GB overall RAM on a
> > 16 Opteron 6276 CPU box. We limit connections to roughly 120, but
> > our webapp is configured to allocate a thread-local connection, so
> > those connections are rarely doing anything more than half the time.
>
> Lower your shared buffers to about 20% of your RAM, unless you've tested
> it's actually helping in your particular case. It's unlikely you'll get
> better performance by using more than that, especially on older
> versions, so it's wiser to leave the rest for page cache.
>
> It might even be one of the causes of the performance issue you're
> seeing, as shared buffers are not exactly overhead-free.
>
> See this for more details on tuning:
>
> http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

I had followed the general directions from several sources years ago, which
indicate up to 40% of RAM. We've been running very large shared buffers for
4 years now, but it is difficult to generate a good real load without
testing against users, so we have not felt the need to move it around. In
general, I don't tend to tinker with a setting that has been fine for this
long without good reason. I've been wanting to upgrade to the newer
mmap-based versions of pgsql, but was waiting to re-tune this when I did so.

Why do you suspect that shared_buffers would cause the behavior I'm seeing?

>
>
> You're on a rather old 9.1.x version, BTW. The last version in this
> branch is 9.1.10 and there are some important security fixes (e.g. in
> 9.1.9). Not sure if there are any fixes relevant to the performance
> issue, though.
>
> An upgrade to 9.1.10 is planned.

> A few initial questions:
>
> * What OS are we dealing with?
>

CentOS el6

>
> * So how many active connections are there on average (see
> pg_stat_activity for connections running queries)?
>

about 40-60

>
> * How much data are we talking about? In total and in the imports?
>

80GB database. The imports are maybe 1-3 MB...often much smaller. 10k rows
would be a probably average.

>
> > We have been running smoothly for over a year on this configuration,
> > and recently started having huge CPU spikes that bring the system to
> > its knees. Given that it is a multiuser system, it has been quite
> > hard to pinpoint the exact cause, but I think we've narrowed it down
> > to two data import jobs that were running in semi-long transactions
> > (clusters of row inserts).
> >
> > The tables affected by these inserts are used in common queries.
> >
> > The imports will bring in a row count of perhaps 10k on average
> > covering 4 tables.
> >
> > The insert transactions are at isolation level read committed (the
> > default for the JDBC driver).
> >
> > When the import would run (again, theory...we have not been able to
> > reproduce), we would end up maxed out on CPU, with a load average of
> > 50 for 16 CPUs (our normal busy usage is a load average of 5 out of
> > 16 CPUs).
> >
> > When looking at the active queries, most of them are against the
> > tables that are affected by these imports.
>
> Which processes consume most CPU time? Are those backends executing the
> queries, or some background processes (checkpointer, autovacuum, ...)?
>
>
The backends executing the queries...most of the queries that seem hung
usually run in a few ms.

> Can you post a "top -c" output collected at the time of the CPU peak?
>
>
Don't have process accounting, so I cannot regenerate that; however, I can
tell you what queries were active at one of them.

There were 36 of the queries agains table ind_event (which is one affected
by the import). Those queries usually take 5-10ms, and we never see more
than 2 active during normal operation. These had been active for
_minutes_....a sample of the running queries:

time_active | datname | procpid | query

The insert listed there is coming from that import...the others are
querying a view that includes that table in a join.

Also, try to collect a few snapshots of pg_stat_bgwriter catalog before
> and during the loads. Don't forget to include the timestamp:
>
> select now(), * from pg_stat_bgwriter;
>
>
This is a live production system, and it will take me some doing to
generate a load on a test server that triggers the condition. I'll be
certain to gather this and the other stats if I can trigger it.

> and when you're at it, pg_stat_database snapshots might be handy too
> (not sure if you're running a single database or multiple ones), so use
> either
>
> select now(), * from pg_stat_database;
>
> or
>
> select now(), * from pg_stat_database where datname = '..dbname..';
>
> That should give us at least some insight into what's happening.
>
> > Our workaround (that is holding at present) was to drop the
> > transactions on those imports (which is not optimal, but fortunately
> > is acceptable for this particular data). This workaround has
> > prevented any further incidents, but is of course inconclusive.
> >
> > Does this sound familiar to anyone, and if so, please advise.
>
> I'm wondering how this could be related to the transactions, and IIRC
> the stats (e.g. # of inserted rows) are sent at commit time. That might
> trigger the autovacuum. But without the transactions the autovacuum
> would be triggered sooner ...
>
> regards
> Tomas
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Re: CPU spikes and transactions at 2013-10-14 23:42:17 from Tomas Vondra

Responses

Re: CPU spikes and transactions at 2013-10-15 17:26:23 from Julien Cigar

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Tony Kay	2013-10-15 16:09:36	Re: CPU spikes and transactions
Previous Message	Merlin Moncure	2013-10-15 13:00:44	Re: CPU spikes and transactions