From: | Borislav Ivanov <bivanov(at)atlassian(dot)com> |
---|---|
To: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
Cc: | Kevin Grittner <kgrittn(at)ymail(dot)com>, "Vasudevan, Ramya" <ramya(dot)vasudevan(at)classmates(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: max_connections reached in postgres 9.3.3 |
Date: | 2014-06-20 00:26:37 |
Message-ID: | CAEqRsYUkrCUO6rMHCJuDPM-S5tAz4EEsqyHeXv3Xm3c9mWJ7Qg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
We do record perf data. For each incident we've had the data looks about
the same. Unfortunately, I can't read much into it. Besides it getting
stuck on a spinlock. But why and with what?
### from perf report
53.28% postmaster postgres [.] s_lock
6.22% postmaster postgres [.] 0x00000000001b4306
2.30% postmaster [kernel.kallsyms] [k] _spin_lock
2.23% postmaster postgres [.] LWLockAcquire
1.79% postmaster postgres [.] LWLockRelease
1.39% postmaster postgres [.]
hash_search_with_hash_value
1.20% postmaster postgres [.] SearchCatCache
0.71% postmaster [kernel.kallsyms] [k] hrtimer_interrupt
0.56% postmaster [kernel.kallsyms] [k] tick_program_event
0.54% postmaster [kernel.kallsyms] [k] schedule
0.44% postmaster [kernel.kallsyms] [k] _spin_lock_irq
### Then zooming on s_lock
# Annotate s_lock
...
99.04 │ test %al,%al
...
# Zoom into postmaster(81487) thread
55.84% postmaster postgres [.] s_lock
◆
6.52% postmaster postgres [.] 0x00000000001b4306
▒
2.42% postmaster [kernel.kallsyms] [k] _spin_lock
▒
2.34% postmaster postgres [.] LWLockAcquire
▒
1.87% postmaster postgres [.] LWLockRelease
▒
1.46% postmaster postgres [.] hash_search_with_hash_value
▒
1.26% postmaster postgres [.] SearchCatCache
▒
0.75% postmaster [kernel.kallsyms] [k] hrtimer_interrupt
▒
0.59% postmaster [kernel.kallsyms] [k] tick_program_event
▒
0.57% postmaster [kernel.kallsyms] [k] schedule
# Zoom into postgres DSO
65.75% postmaster [.] s_lock
◆
7.68% postmaster [.] 0x00000000001b4306
▒
2.75% postmaster [.] LWLockAcquire
▒
2.20% postmaster [.] LWLockRelease
▒
1.72% postmaster [.] hash_search_with_hash_value
▒
1.49% postmaster [.] SearchCatCache
▒
0.54% postmaster [.] _bt_compare
▒
0.51% postmaster [.] _bt_checkkeys
On 19 June 2014 15:57, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> On Thu, Jun 19, 2014 at 5:12 PM, Borislav Ivanov <bivanov(at)atlassian(dot)com>
> wrote:
> > However, most people on our team think that the number of connections is
> > purely a symptom of the actual problem. We would love to be wrong about
> > this. But for now we feel the high number of connections contributes for
> > preserving the problem but it's not actually triggering the problem.
>
> This is entirely correct. pgbouncer does not preventing database load
> but about limiting damage when it occurs. This generally necessary in
> environments where application servers keep piling on connections when
> the database is not clearing queries fast enough.
>
> In your case user% is dominating system load. Along with the high cs
> this is really suggesting spinlock contention. A 'perf top' is
> essential for identifying the culprit. It's very possible that 9.4
> will fix your problem...see:
>
> http://postgresql.1045698.n5.nabble.com/Cpu-usage-100-on-slave-s-lock-problem-td5768655.html
> .
> There was some poorly optimized code in the wal replay.
>
> merlin
>
From | Date | Subject | |
---|---|---|---|
Next Message | Erik van Zijst | 2014-06-20 05:10:19 | Re: max_connections reached in postgres 9.3.3 |
Previous Message | Vasudevan, Ramya | 2014-06-19 23:14:05 | Re: max_connections reached in postgres 9.3.3 |