Re: max_connections reached in postgres 9.3.3

From: Borislav Ivanov <bivanov(at)atlassian(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, "Vasudevan, Ramya" <ramya(dot)vasudevan(at)classmates(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: max_connections reached in postgres 9.3.3
Date: 2014-06-20 00:26:37
Message-ID: CAEqRsYUkrCUO6rMHCJuDPM-S5tAz4EEsqyHeXv3Xm3c9mWJ7Qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

We do record perf data. For each incident we've had the data looks about
the same. Unfortunately, I can't read much into it. Besides it getting
stuck on a spinlock. But why and with what?

### from perf report

53.28% postmaster postgres [.] s_lock

6.22% postmaster postgres [.] 0x00000000001b4306

2.30% postmaster [kernel.kallsyms] [k] _spin_lock

2.23% postmaster postgres [.] LWLockAcquire

1.79% postmaster postgres [.] LWLockRelease

1.39% postmaster postgres [.]
hash_search_with_hash_value

1.20% postmaster postgres [.] SearchCatCache

0.71% postmaster [kernel.kallsyms] [k] hrtimer_interrupt

0.56% postmaster [kernel.kallsyms] [k] tick_program_event

0.54% postmaster [kernel.kallsyms] [k] schedule

0.44% postmaster [kernel.kallsyms] [k] _spin_lock_irq

### Then zooming on s_lock

# Annotate s_lock

...

99.04 │ test %al,%al

...

# Zoom into postmaster(81487) thread

55.84% postmaster postgres [.] s_lock

6.52% postmaster postgres [.] 0x00000000001b4306

2.42% postmaster [kernel.kallsyms] [k] _spin_lock

2.34% postmaster postgres [.] LWLockAcquire

1.87% postmaster postgres [.] LWLockRelease

1.46% postmaster postgres [.] hash_search_with_hash_value

1.26% postmaster postgres [.] SearchCatCache

0.75% postmaster [kernel.kallsyms] [k] hrtimer_interrupt

0.59% postmaster [kernel.kallsyms] [k] tick_program_event

0.57% postmaster [kernel.kallsyms] [k] schedule

# Zoom into postgres DSO

65.75% postmaster [.] s_lock

7.68% postmaster [.] 0x00000000001b4306

2.75% postmaster [.] LWLockAcquire

2.20% postmaster [.] LWLockRelease

1.72% postmaster [.] hash_search_with_hash_value

1.49% postmaster [.] SearchCatCache

0.54% postmaster [.] _bt_compare

0.51% postmaster [.] _bt_checkkeys

On 19 June 2014 15:57, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:

> On Thu, Jun 19, 2014 at 5:12 PM, Borislav Ivanov <bivanov(at)atlassian(dot)com>
> wrote:
> > However, most people on our team think that the number of connections is
> > purely a symptom of the actual problem. We would love to be wrong about
> > this. But for now we feel the high number of connections contributes for
> > preserving the problem but it's not actually triggering the problem.
>
> This is entirely correct. pgbouncer does not preventing database load
> but about limiting damage when it occurs. This generally necessary in
> environments where application servers keep piling on connections when
> the database is not clearing queries fast enough.
>
> In your case user% is dominating system load. Along with the high cs
> this is really suggesting spinlock contention. A 'perf top' is
> essential for identifying the culprit. It's very possible that 9.4
> will fix your problem...see:
>
> http://postgresql.1045698.n5.nabble.com/Cpu-usage-100-on-slave-s-lock-problem-td5768655.html
> .
> There was some poorly optimized code in the wal replay.
>
> merlin
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Erik van Zijst 2014-06-20 05:10:19 Re: max_connections reached in postgres 9.3.3
Previous Message Vasudevan, Ramya 2014-06-19 23:14:05 Re: max_connections reached in postgres 9.3.3