Postgres Query Plan Live Lock

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: "Pweaver (Paul Weaver)" <pweaver(at)panjiva(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, Michael Vezza <michael(at)panjiva(dot)com>
Subject: Postgres Query Plan Live Lock
Date: 2014-02-05 14:52:20
Message-ID: CAMkU=1zWEg9ejNR0a0tsHkaBO0UHTmS5VghCYrFNZGyNBcX0dg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Monday, February 3, 2014, Pweaver (Paul Weaver) <pweaver(at)panjiva(dot)com>
wrote:

> We have been running into a (live lock?) issue on our production Postgres
> instance causing queries referencing a particular table to become extremely
> slow and our application to lock up.
>
> This tends to occur on a particular table that gets a lot of queries
> against it after a large number of deletes. When this happens, the
> following symptoms occur when queries referencing that table are run (even
> it we stop the deleting):
>

What do you mean by "stop the deleting"? Are you pausing the delete but
without either committing or rolling back the transaction, but just holding
it open? Are you stopping it cleanly, between transactions?

Also, how many queries are happening concurrently? Perhaps you need a
connection pooler.

Is the CPU time user time or system time? What kernel version do you have?

> SELECT * FROM table_name LIMIT 10; -- takes ~45 seconds to complete
> EXPLAIN SELECT * FROM table_name LIMIT 10; -- takes ~45 seconds to
> complete the explain query, the query plan looks reasonable
>

This sounds like the problem we heard quite a bit about recently, where
processes spend a lot of time fighting over the proclock while they try to
check the commit status of open transactions while. But I don't see how
deletes could trigger that behavior. If the delete has not committed, the
tuples are still visible and the LIMIT 10 is quickly satisfied. If the
delete has committed, the tuples quickly get hinted, and so the next query
along should be faster.

I also don't see why the explain would be slow. A similar problem was
tracked down to digging through in-doubt tuples while trying to use an
index to find the true the min or max during estimating the cost of a merge
join. But I don't think a simple table query should lead to that, unless
table_name is a view. And I don't see how deletes, rather than uncommitted
inserts, could trigger it either.

max_connections | 600 |
> configuration file
>

That is quite extreme. If a temporary load spike (like from the deletes
and the hinting needed after them) slows down the select queries and you
start more and more of them, soon you could tip the system over into kernel
scheduler insanity with high system time. Once in this mode, it will stay
there until the incoming stream of queries stops and the existing ones
clear out. But, if that is what is occurring, I don't know why queries on
other tables would still be fast.

Cheers,

Jeff

>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Claudio Freire 2014-02-05 15:28:39 Re: increasing query time after analyze
Previous Message Pavel Stehule 2014-02-05 12:15:17 Re: increasing query time after analyze