Re: Problems with pg_locks explosion

From: Steven Crandell <steven(dot)crandell(at)gmail(dot)com>
To: Armand du Plessis <adp(at)bank(dot)io>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Problems with pg_locks explosion
Date: 2013-04-02 01:07:47
Message-ID: CALvesgn_oG06K4+bQZ+FkM4P0QES3jBz8dmkwxuq4dD15zLy2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Armand,

All of the symptoms you describe line up perfectly with a problem I had
recently when upgrading DB hardware.
Everything ran find until we hit some threshold somewhere at which point
the locks would pile up in the thousands just as you describe, all while we
were not I/O bound.

I was moving from a DELL 810 that used a flex memory bridge to a DELL 820
that used round robin on their quad core intels.
(Interestingly we also found out that DELL is planning on rolling back to
the flex memory bridge later this year.)

Any chance you could find out if your old processors might have been using
flex while you're new processors might be using round robin?

-s

On Mon, Apr 1, 2013 at 5:59 PM, Armand du Plessis <adp(at)bank(dot)io> wrote:

> I've run an EXPLAIN ANALYZE on one of the queries that appeared in the
> pg_locks (although like you say that might be a red herring) both during
> normal response times (2) and also after the locks backlog materialized (1)
>
> The output below, I've just blanked out some columns. The IO timings do
> seem an order of magnitude slower but not excessive unless I'm reading it
> wrong.
>
> "Limit (cost=2364.19..2365.44 rows=500 width=177) (actual
> time=6501.103..6507.196 rows=500 loops=1)"
> " Output:
> " Buffers: shared hit=7163 read=137"
> " I/O Timings: read=107.771"
> " -> Sort (cost=2364.19..2365.56 rows=549 width=177) (actual
> time=6501.095..6503.216 rows=500 loops=1)"
> " Output:
> " Sort Key: messages.created_at"
> " Sort Method: quicksort Memory: 294kB"
> " Buffers: shared hit=7163 read=137"
> " I/O Timings: read=107.771"
> " -> Nested Loop (cost=181.19..2339.21 rows=549 width=177)
> (actual time=6344.410..6495.377 rows=783 loops=1)"
> " Output:
> " Buffers: shared hit=7160 read=137"
> " I/O Timings: read=107.771"
> " -> Nested Loop (cost=181.19..1568.99 rows=549 width=177)
> (actual time=6344.389..6470.549 rows=783 loops=1)"
> " Output:
> " Buffers: shared hit=3931 read=137"
> " I/O Timings: read=107.771"
> " -> Bitmap Heap Scan on public.messages
> (cost=181.19..798.78 rows=549 width=177) (actual time=6344.342..6436.117
> rows=783 loops=1)"
> " Output:
> " Recheck Cond:
> " Buffers: shared hit=707 read=137"
> " I/O Timings: read=107.771"
> " -> BitmapOr (cost=181.19..181.19 rows=549
> width=0) (actual time=6344.226..6344.226 rows=0 loops=1)"
> " Buffers: shared hit=120 read=20"
> " I/O Timings: read=37.085"
> " -> Bitmap Index Scan on
> messages_sender_type_created_at_idx (cost=0.00..23.41 rows=309 width=0)
> (actual time=6343.358..6343.358 rows=366 loops=1)"
> " Index Cond:
> " Buffers: shared hit=26 read=15"
> " I/O Timings: read=36.977"
> " -> Bitmap Index Scan on
> messages_type_sender_recipient_created_at
> " Buffers: shared hit=94 read=5"
> " I/O Timings: read=0.108"
> " -> Index Only Scan using profiles_pkey on
> public.profiles (cost=0.00..1.39 rows=1 width=4) (actual time=0.018..0.024
> rows=1 loops=783)"
> " Output: profiles.id"
> " Index Cond: (profiles.id = messages.sender)"
> " Heap Fetches: 661"
> " Buffers: shared hit=3224"
> " -> Index Only Scan using profiles_pkey on public.profiles
> recipient_profiles_messages (cost=0.00..1.39 rows=1 width=4) (actual
> time=0.014..0.018 rows=1 loops=783)"
> " Output: recipient_profiles_messages.id"
> " Index Cond: (recipient_profiles_messages.id =
> messages.recipient)"
> " Heap Fetches: 667"
> " Buffers: shared hit=3229"
> "Total runtime: 6509.328 ms"
>
>
>
> "Limit (cost=2366.57..2367.82 rows=500 width=177) (actual time=
> 73.284..76.296 rows=500 loops=1)"
> " Output: various columns"
> " Buffers: shared hit=6738 read=562"
> " I/O Timings: read=19.212"
> " -> Sort (cost=2366.57..2367.94 rows=549 width=177) (actual time=
> 73.276..74.300 rows=500 loops=1)"
> " Output: various columns"
> " Sort Key: messages.created_at"
> " Sort Method: quicksort Memory: 294kB"
> " Buffers: shared hit=6738 read=562"
> " I/O Timings: read=19.212"
> " -> Nested Loop (cost=181.19..2341.59 rows=549 width=177)
> (actual time=3.556..69.866 rows=783 loops=1)"
> " Output: various columns
> " Buffers: shared hit=6735 read=562"
> " I/O Timings: read=19.212"
> " -> Nested Loop (cost=181.19..1570.19 rows=549 width=177)
> (actual time=3.497..53.820 rows=783 loops=1)"
> " Output: various columns
> " Buffers: shared hit=3506 read=562"
> " I/O Timings: read=19.212"
> " -> Bitmap Heap Scan on public.messages
> (cost=181.19..798.78 rows=549 width=177) (actual time=3.408..32.906
> rows=783 loops=1)"
> " Output: various columns
> " Recheck Cond: ()
> " Buffers: shared hit=282 read=562"
> " I/O Timings: read=19.212"
> " -> BitmapOr (cost=181.19..181.19 rows=549
> width=0) (actual time=3.279..3.279 rows=0 loops=1)"
> " Buffers: shared hit=114 read=26"
> " I/O Timings: read=1.755"
> " -> Bitmap Index Scan on
> messages_sender_type_created_at_idx (cost=0.00..23.41 rows=309 width=0)
> (actual time=1.882..1.882 rows=366 loops=1)"
> " Index Cond:
> " Buffers: shared hit=25 read=16"
> " I/O Timings: read=1.085"
> " -> Bitmap Index Scan on
> " Buffers: shared hit=89 read=10"
> " I/O Timings: read=0.670"
> " -> Index Only Scan using profiles_pkey on
> public.profiles (cost=0.00..1.40 rows=1 width=4) (actual time=0.012..0.015
> rows=1 loops=783)"
> " Output: profiles.id"
> " Index Cond: (profiles.id = messages.sender)"
> " Heap Fetches: 654"
> " Buffers: shared hit=3224"
> " -> Index Only Scan using profiles_pkey on public.profiles
> recipient_profiles_messages (cost=0.00..1.40 rows=1 width=4) (actual
> time=0.007..0.009 rows=1 loops=783)"
> " Output: recipient_profiles_messages.id"
> " Index Cond: (recipient_profiles_messages.id =
> messages.recipient)"
> " Heap Fetches: 647"
> " Buffers: shared hit=3229"
> "Total runtime: 77.528 ms"
>
>
>
> On Tue, Apr 2, 2013 at 2:40 AM, Armand du Plessis <adp(at)bank(dot)io> wrote:
>
>> Hi Jeff,
>>
>> Sorry I should've mentioned the new instance is Postgres 9.2.3. The old
>> instance IO maxed out the disk/io available on a single EBS volume on AWS.
>> It had 2000 PIOPS but was constantly bottlenecked. I assumed that striping
>> 6 1000 IOPS volumes in RAID-0 would give me some breathing space on that
>> front, and looking at the iostat (just included in previous email) it seems
>> to be doing OK.
>>
>> I actually had pg_pool running as a test but to avoid having too many
>> moving parts in the change removed it from the equation. Need to look into
>> the proper configuration so it doesn't saturate my cluster worse than I'm
>> doing myself.
>>
>> I've commented inline.
>>
>> Regards,
>>
>> Armand
>>
>> PS. This is probably the most helpful mailing list I've ever come across.
>> Starting to feel a little more that it can be solved.
>>
>>
>> On Tue, Apr 2, 2013 at 2:21 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>>
>>> On Mon, Apr 1, 2013 at 3:35 PM, Armand du Plessis <adp(at)bank(dot)io> wrote:
>>>
>>>> [Apologies, I first sent this to the incorrect list, postgres-admin, in
>>>> the event you receive it twice]
>>>>
>>>> Hi there,
>>>>
>>>> I'm hoping someone on the list can shed some light on an issue I'm
>>>> having with our Postgresql cluster. I'm literally tearing out my hair and
>>>> don't have a deep enough understanding of Postgres to find the problem.
>>>>
>>>> What's happening is I had severe disk/io issues on our original
>>>> Postgres cluster (9.0.8)
>>>>
>>> and switched to a new instance with a RAID-0 volume array.
>>>>
>>>
>>> What was the old instance IO? Did you do IO benchmarking on both?
>>>
>>>
>>>> The machine's CPU usage would hover around 30% and our database would
>>>> run lightning fast with pg_locks hovering between 100-200.
>>>>
>>>> Within a few seconds something would trigger a massive increase in
>>>> pg_locks so that it suddenly shoots up to 4000-8000. At this point
>>>> everything dies. Queries that usually take a few milliseconds takes minutes
>>>> and everything is unresponsive until I restart postgres.
>>>>
>>>
>>> I think that pg_locks is pretty much a red herring. All it means is
>>> that you have a lot more active connections than you used to. All active
>>> connections are going to hold various locks, while most idle connections
>>> (other than 'idle in transaction') connections will not hold any.
>>>
>>> Although I doubt it will solve this particular problem, you should
>>> probably use a connection pooler.
>>>
>>>
>>>
>>>> shared_buffers = 32GB
>>>>
>>>
>>> That seems very high. There are reports that using >8 GB leads to
>>> precisely the type of problem you are seeing (checkpoint associated
>>> freezes). Although I've never seen those reports when fsync=off.
>>>
>>> I thought you might be suffering from the problem solved in release 9.1
>>> by item "Merge duplicate fsync requests (Robert Haas, Greg Smith)", but
>>> then I realized that with fsync=off it could not be that.
>>>
>>>
>>>
>>>>
>>>> max_connections = 800
>>>>
>>>
>>> That also is very high.
>>>
>>>
>>>> The problems seem to overlap with checkpoints.
>>>>
>>>> 2013-04-01 21:31:35.592 UTC,,,26877,,5159fa5f.68fd,1,,2013-04-01
>>>> 21:21:35 UTC,,0,LOG,00000,"checkpoint starting: time",,,,,,,,,""
>>>> 2013-04-01 21:40:35.033 UTC,,,26877,,5159fa5f.68fd,2,,2013-04-01
>>>> 21:21:35 UTC,,0,LOG,00000,"checkpoint complete: wrote 100635 buffers
>>>> (2.4%); 0 transaction log file(s) added, 0 removed, 1 recycled;
>>>> write=539.439 s, sync=0.000 s, total=539.441 s; sync files=0, longest=0.000
>>>> s, average=0.000 s",,,,,,,,,""
>>>> 2013-04-01 21:41:35.093 UTC,,,26877,,5159fa5f.68fd,3,,2013-04-01
>>>> 21:21:35 UTC,,0,LOG,00000,"checkpoint starting: time",,,,,,,,,""
>>>>
>>>
>>>
>>> I think you changed checkpoint_timout from default (5 min) to 10
>>> minutes, without telling us. Anyway, this is where it would be nice to
>>> know how much of the 539.439 s in the write phase was spent blocking on
>>> writes, and how much was spent napping. But that info is not collected by
>>> pgsql.
>>>
>>
>> I did actually change it to 25 minutes. Apologies it was probably lost in
>> the text of a previous email. Here's the changed settings:
>>
>> # - Background Writer -
>>
>> bgwriter_delay = 200ms # 10-10000ms between rounds
>> bgwriter_lru_maxpages = 400 # 0-1000 max buffers written/round
>> bgwriter_lru_multiplier = 2.0 # 0-10.0 multipler on buffers
>> scanned/round
>>
>> checkpoint_segments = 128
>> checkpoint_timeout = 25min
>>
>> It seems to be lasting longer with these settings.
>>
>>
>>>
>>> Your top output looked for it was a time at which there were no
>>> problems, and it didn't include the top processes, so it wasn't very
>>> informative.
>>>
>>> If you could upgrade to 9.2 and capture some data with track_io_timing,
>>> that could be useful.
>>>
>>
>> I'm looking into track_io_timing.
>>
>>
>>>
>>> Cheers,
>>>
>>> Jeff
>>>
>>
>>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Armand du Plessis 2013-04-02 02:53:42 Re: Problems with pg_locks explosion
Previous Message Armand du Plessis 2013-04-02 00:59:09 Re: Problems with pg_locks explosion