Re: Intermittent hangs with 9.2

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: David Whittaker <dave(at)iradix(dot)com>
Cc: PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Intermittent hangs with 9.2
Date: 2013-09-10 17:44:37
Message-ID: CAMkU=1wqG7kMd4_EtGYLyf59MKF+OyHvmUdC1HKMp682rv8R0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Sep 10, 2013 at 8:04 AM, David Whittaker <dave(at)iradix(dot)com> wrote:

> Hi All,
>
> I've been seeing a strange issue with our Postgres install for about a
> year now, and I was hoping someone might be able to help point me at the
> cause. At what seem like fairly random intervals Postgres will become
> unresponsive to the 3 application nodes it services. These periods tend to
> last for 10 - 15 minutes before everything rights itself and the system
> goes back to normal.
>
> During these periods the server will report a spike in the outbound
> bandwidth (from about 1mbs to about 5mbs most recently), a huge spike in
> context switches / interrupts (normal peaks are around 2k/8k respectively,
> and during these periods they‘ve gone to 15k/22k), and a load average of
> 100+.
>

I'm curious about the spike it outbound network usage. If the database is
hung and no longer responding to queries, what is it getting sent over the
network? Can you snoop on that traffic?

> CPU usage stays relatively low, but it’s all system time reported, user
> time goes to zero. It doesn‘t seem to be disk related since we’re running
> with a shared_buffers setting of 24G, which will fit just about our entire
> database into memory, and the IO transactions reported by the server, as
> well as the disk reads reported by Postgres stay consistently low.
>
There have been reports that using very large shared_buffers can cause a
lot of contention issues in the kernel, for some kernels. The usual advice
is not to set shared_buffers above 8GB. The operating system can use the
rest of the memory to cache for you.

Also, using a connection pooler and lowering the number of connections to
the real database has solved problems like this before.

> We‘ve recently started tracking how long statements take to execute, and
> we’re seeing some really odd numbers. A simple delete by primary key, for
> example, from a table that contains about 280,000 rows, reportedly took
> 18h59m46.900s. An update by primary key in that same table was reported as
> 7d 17h 58m 30.415s. That table is frequently accessed, but obviously those
> numbers don't seem reasonable at all.
>
How are your tracking those? Is it log_min_duration_statement or something
else?

Cheers,

Jeff

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message David Whittaker 2013-09-10 18:04:57 Intermittent hangs with 9.2
Previous Message ktm@rice.edu 2013-09-10 15:33:48 Re: Intermittent hangs with 9.2