Intermittent hangs with 9.2

From: David Whittaker <dave(at)iradix(dot)com>
To: PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Intermittent hangs with 9.2
Date: 2013-09-10 18:06:26
Message-ID: CABXnLXQsX0Kp6qz=RHt+_-JYnx9yCuFs95ZkS1boHhW2aBHV=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Jeff

On Tue, Sep 10, 2013 at 1:44 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Tue, Sep 10, 2013 at 8:04 AM, David Whittaker <dave(at)iradix(dot)com> wrote:
>
>> Hi All,
>>
>> I've been seeing a strange issue with our Postgres install for about a
>> year now, and I was hoping someone might be able to help point me at the
>> cause. At what seem like fairly random intervals Postgres will become
>> unresponsive to the 3 application nodes it services. These periods tend to
>> last for 10 - 15 minutes before everything rights itself and the system
>> goes back to normal.
>>
>> During these periods the server will report a spike in the outbound
>> bandwidth (from about 1mbs to about 5mbs most recently), a huge spike in
>> context switches / interrupts (normal peaks are around 2k/8k respectively,
>> and during these periods they‘ve gone to 15k/22k), and a load average of
>> 100+.
>>
>
> I'm curious about the spike it outbound network usage. If the database is
> hung and no longer responding to queries, what is it getting sent over the
> network? Can you snoop on that traffic?
>

It seems curious to me as well. I don't know if one, or a few of the pg
connections are streaming out data and somehow blocking the others in the
process, or the data could be unrelated to pg. If you can suggest a tool I
could use to monitor the data transfer continuously and get some type of
summary of what happened after the issue reoccurs, I'd appreciate it.
Otherwise, I'll try to get in and catch some specifics the next time it
happens.

>
>
>> CPU usage stays relatively low, but it’s all system time reported, user
>> time goes to zero. It doesn‘t seem to be disk related since we’re running
>> with a shared_buffers setting of 24G, which will fit just about our entire
>> database into memory, and the IO transactions reported by the server, as
>> well as the disk reads reported by Postgres stay consistently low.
>>
> There have been reports that using very large shared_buffers can cause a
> lot of contention issues in the kernel, for some kernels. The usual advice
> is not to set shared_buffers above 8GB. The operating system can use the
> rest of the memory to cache for you.
>
> Also, using a connection pooler and lowering the number of connections to
> the real database has solved problems like this before.
>

We're going to implement both of these changes tonight. I was going to go
with 12G for shared_buffers based on Andrew's suggestion, but maybe I'll go
down to 8 if that seems to be the magic number. We're also going to
decrease the max connections from 500 to 100 and decrease the pooled
connections per server.

>
>
>> We‘ve recently started tracking how long statements take to execute,
>> and we’re seeing some really odd numbers. A simple delete by primary key,
>> for example, from a table that contains about 280,000 rows, reportedly took
>> 18h59m46.900s. An update by primary key in that same table was reported as
>> 7d 17h 58m 30.415s. That table is frequently accessed, but obviously those
>> numbers don't seem reasonable at all.
>>
> How are your tracking those? Is it log_min_duration_statement or
> something else?
>

We're using log_min_duration_statement = 1000, sending the log messages to
syslog, then analyzing with pg_badger.

>
> Cheers,
>
> Jeff
>

Browse pgsql-performance by date

  From Date Subject
Next Message Torsten Förtsch 2013-09-11 07:26:31 Re: Intermittent hangs with 9.2
Previous Message David Whittaker 2013-09-10 18:05:41 Intermittent hangs with 9.2