Re: Problems with pg_locks explosion

From: Armand du Plessis <adp(at)bank(dot)io>
To: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Problems with pg_locks explosion
Date: 2013-04-02 02:53:42
Message-ID: CANf99sVcffphectj+pQ4w-cczbB5Yi+aR_c1hpmbh_G+Bfuo0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Steven,

Sounds very familiar. Painfully familiar :(

But I really don't know. All I can see is that in this particular
configuration the instance has 2 x Intel Xeon E5-2670, eight-core
processors. I can't find any info on whether it's flex or round robin. AWS
typically don't make the underlying hardware known. The exception is on the
chip-types on the higher-end instance types which is where I got the info
above from.

Below is an excerpt from atop when the problem occur. The CPUs jump to high
sys usage, not sure if that was similar to what you saw?

How did you get it resolved in the end?

ATOP - ip-10-155-231-112
2013/04/02 01:25:40 ------
2s elapsed
59;169H 0 70.15s | | user 8.19s | |
| | | #proc 1015 | |
#zombie 0 | | clones 0 | |
| | | #exit 2 |
CPU | sys 3182% | | user 30% | | irq
1% | | | | idle 0% |
| wait 0% | | | |
steal 0% | | guest 0% |
cpu | sys 98% | | user 1% | | irq
1% | | | | idle 0% |
| cpu000 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 96% | | user 4% | | irq
0% | | | | idle 0% |
| cpu001 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 100% | | user 0% | | irq
0% | | | | idle 0% |
| cpu002 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 99% | | user 1% | | irq
0% | | | | idle 0% |
| cpu003 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 100% | | user 0% | | irq
0% | | | | idle 0% |
| cpu004 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 100% | | user 0% | | irq
0% | | | | idle 0% |
| cpu005 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 98% | | user 2% | | irq
0% | | | | idle 0% |
| cpu006 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 99% | | user 1% | | irq
0% | | | | idle 0% |
| cpu007 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 99% | | user 1% | | irq
0% | | | | idle 0% |
| cpu008 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 100% | | user 0% | | irq
0% | | | | idle 0% |
| cpu009 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 99% | | user 1% | | irq
0% | | | | idle 0% |
| cpu010 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 100% | | user 0% | | irq
0% | | | | idle 0% |
| cpu011 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 99% | | user 1% | | irq
0% | | | | idle 0% |
| cpu012 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 97% | | user 3% | | irq
0% | | | | idle 0% |
| cpu013 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 100% | | user 0% | | irq
0% | | | | idle 0% |
| cpu014 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 100% | | user 0% | | irq
0% | | | | idle 0% |
| cpu015 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 100% | | user 0% | | irq
0% | | | | idle 0% |
| cpu016 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 82% | | user 18% | | irq
0% | | | | idle 0% |
| cpu017 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 100% | | user 0% | | irq
0% | | | | idle 0% |
| cpu018 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 100% | | user 0% | | irq
0% | | | | idle 0% |
| cpu019 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 100% | | user 0% | | irq
0% | | | | idle 0% |
| cpu020 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 100% | | user 0% | | irq
0% | | | | idle 0% |
| cpu021 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 100% | | user 0% | | irq
0% | | | | idle 0% |
| cpu022 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 100% | | user 0% | | irq
0% | | | | idle 0% |
| cpu023 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 100% | | user 0% | | irq
0% | | | | idle 0% |
| cpu024 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 100% | | user 0% | | irq
0% | | | | idle 0% |
| cpu025 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 100% | | user 0% | | irq
0% | | | | idle 0% |
| cpu026 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 100% | | user 0% | | irq
0% | | | | idle 0% |
| cpu027 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 99% | | user 1% | | irq
0% | | | | idle 0% |
| cpu028 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 100% | | user 0% | | irq
0% | | | | idle 0% |
| cpu029 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 100% | | user 0% | | irq
0% | | | | idle 0% |
| cpu030 w 0% | | | |
steal 0% | | guest 0% |
cpu | sys 99% | | user 1% | | irq
0% | | | | idle 0% |
| cpu031 w 0% | | | |
steal 0% | | guest 0% |
CPL | avg1 90.60 | | avg5 60.80 | |
| avg15 39.77 | | | |
csw 1011 | | intr 17568 | |
| | | numcpu 32 |
MEM | tot 58.5G | | free 418.4M | | cache
45.0G | dirty 0.6M | | buff 5.8M | |
slab 501.2M | | | |
| | | |
SWP | tot 0.0M | | free 0.0M | |
| | | | |
| | | | |
vmcom 49.8G | | vmlim 29.3G |
PAG | scan 1858 | | | | stall 0
| | | | |
| | | swin 0 | |
| | swout 0 |
NET | transport | tcpi 318 | | tcpo 392 | udpi
34 | | udpo 39 | tcpao 0 | | tcppo
2 | tcprs 0 | | tcpie 0 | tcpor 0 |
| udpnp 0 | udpip 0 |
NET | network | | ipi 357 | | ipo
397 | ipfrw 0 | | deliv 357 | |
| | | | |
icmpi 0 | | icmpo 0 |
NET | eth0 ---- | | pcki 318 | pcko 358 |
| si 200 Kbps | so 947 Kbps | | coll 0 |
| mlti 0 | erri 0 | | erro 0 |
drpi 0 | | drpo 0 |
NET | lo ---- | | pcki 39 | pcko 39 |
| si 79 Kbps | so 79 Kbps | | coll 0 |
| mlti 0 | erri 0 | | erro 0 |
drpi 0 | | drpo 0 |
debug2: channel 0: window 997757 sent adjust 50819

On Tue, Apr 2, 2013 at 3:07 AM, Steven Crandell
<steven(dot)crandell(at)gmail(dot)com>wrote:

> Armand,
>
> All of the symptoms you describe line up perfectly with a problem I had
> recently when upgrading DB hardware.
> Everything ran find until we hit some threshold somewhere at which point
> the locks would pile up in the thousands just as you describe, all while we
> were not I/O bound.
>
> I was moving from a DELL 810 that used a flex memory bridge to a DELL 820
> that used round robin on their quad core intels.
> (Interestingly we also found out that DELL is planning on rolling back to
> the flex memory bridge later this year.)
>
> Any chance you could find out if your old processors might have been using
> flex while you're new processors might be using round robin?
>
> -s
>
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tory M Blue 2013-04-02 03:35:54 Re: Postgres upgrade, security release, where?
Previous Message Steven Crandell 2013-04-02 01:07:47 Re: Problems with pg_locks explosion