Re: Simple queries slowdown, maybe related to 3-minute long "<IDLE> in transaction"?

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: dmitry(at)koterov(dot)ru
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Simple queries slowdown, maybe related to 3-minute long "<IDLE> in transaction"?
Date: 2012-11-20 20:33:54
Message-ID: CAHyXU0w4tD4_J0ZpeyPdb+V+6+BGkh8bLLTruWYgw4MsgfiMdw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Nov 20, 2012 at 1:25 PM, Dmitry Koterov <dmitry(at)koterov(dot)ru> wrote:
> Hello.
>
> Sometimes I see a strange slowdown on my PG 9.1 server: it looks like the
> simplest queries which typically take 1ms or less (e.g. selection of a row
> by its primary key) take 300ms or even more. It is related to all queries
> within the connection, not the single one: once upon a time all fast queries
> start to work 300 times slower. In a couple of minutes the effect
> disappears. The machine is not overloaded, number of connections is not too
> high, mtr shows no packet loss.
>
> E.g. see the following log from a particular website page I've been
> refreshing (times are in milliseconds, you may see that typically all 22
> queries take near 55ms, but within 3 minutes - from 20:21 till 20:23 approx
> - even simplest queries took more than 300ms instead of 1ms, so the total
> time is many seconds).
>
> web-01: [2012-11-20 20:20:57] web07725] total=250 cpu=210
> php=203 dbq=22 db=55ms
> web-01: [2012-11-20 20:21:06] web07752] total=6254 cpu=220
> php=219 dbq=22 db=6035ms <--
> web-01: [2012-11-20 20:21:48] web07718] total=7506 cpu=220
> php=222 dbq=22 db=7284ms <--
> web-01: [2012-11-20 20:22:15] web07717] total=11143 cpu=224
> php=225 dbq=22 db=10918ms <--
> web-01: [2012-11-20 20:23:44] web07725] total=4402 cpu=224
> php=220 dbq=22 db=4182ms <--
> web-01: [2012-11-20 20:24:03] web07725] total=268 cpu=212
> php=211 dbq=22 db=57ms
>
> The machine is not overloaded during all this period (I've been logging
> "top" and "pstree" each 10 seconds, results are quite typical):
>
> Nov 20 20:23:07 pg-m03 mon: [Tue Nov 20 20:23:07 MSK 2012]
> Nov 20 20:23:07 pg-m03 mon: init-+-crond
> Nov 20 20:23:07 pg-m03 mon: |-master-+-pickup
> Nov 20 20:23:07 pg-m03 mon: | `-qmgr
> Nov 20 20:23:07 pg-m03 mon: |-postmaster---32*[postmaster]
> Nov 20 20:23:07 pg-m03 mon:
> |-screen---sh---su---bash---monitoring.sh-+-logger
> Nov 20 20:23:07 pg-m03 mon: |
> `-monitoring.sh---pstree
> Nov 20 20:23:07 pg-m03 mon: |-sshd---sshd---perl
> Nov 20 20:23:07 pg-m03 mon: |-syslog-ng
> Nov 20 20:23:07 pg-m03 mon: |-udevd
> Nov 20 20:23:07 pg-m03 mon: |-vzctl---bash---mc---bash
> Nov 20 20:23:07 pg-m03 mon: `-xinetd
> Nov 20 20:23:07 pg-m03 mon:
> Nov 20 20:23:08 pg-m03 mon: top - 20:23:08 up 113 days, 3:29, 0 users,
> load average: 2.16, 2.10, 2.00
> Nov 20 20:23:08 pg-m03 mon: Tasks: 1 total, 0 running, 1 sleeping, 0
> stopped, 0 zombie
> Nov 20 20:23:08 pg-m03 mon: Cpu(s): 23.8%us, 2.8%sy, 0.0%ni, 72.7%id,
> 0.8%wa, 0.0%hi, 0.0%si, 0.0%st
> Nov 20 20:23:08 pg-m03 mon: Mem: 1463616k total, 1097776k used, 365840k
> free, 0k buffers
>
> Also I've been logging all queries which are active in the database each 10
> seconds. During the slowdown the database was not overloaded, no strange
> queries, except one thing: during the whole slowdown period two pg processes
> was in "<IDLE> in transaction" state, and in 3 minutes (see 3rd column -
> it's a "time from start of the query") these processes disappeared, and the
> slowdown disappeared too:
>
> | duration | query
> Nov 20 20:22:05 pg-m03 mon: 21586 | 00:00:37.571057 | <IDLE> in
> transaction
> Nov 20 20:22:05 pg-m03 mon: 21567 | 00:00:37.557606 | <IDLE> in
> transaction
> Nov 20 20:22:36 pg-m03 mon: 21586 | 00:01:09.119639 | <IDLE> in
> transaction
> Nov 20 20:22:36 pg-m03 mon: 21567 | 00:01:09.106188 | <IDLE> in
> transaction
> Nov 20 20:22:47 pg-m03 mon: 21586 | 00:01:19.637888 | <IDLE> in
> transaction
> Nov 20 20:22:47 pg-m03 mon: 21567 | 00:01:19.624437 | <IDLE> in
> transaction
> Nov 20 20:22:57 pg-m03 mon: 21586 | 00:01:30.155492 | <IDLE> in
> transaction
> Nov 20 20:22:57 pg-m03 mon: 21567 | 00:01:30.142041 | <IDLE> in
> transaction
> Nov 20 20:24:21 pg-m03 mon: 21586 | 00:02:54.294799 | <IDLE> in
> transaction
> Nov 20 20:24:21 pg-m03 mon: 21567 | 00:02:54.281348 | <IDLE> in
> transaction
> Nov 20 20:24:31 - no more "<IDLE> in transaction" (seems they disappeared
> after 3 minutes timeout?)
>
> Maybe these are correlated? But how could it be that two "idle" processes
> cause slowdown of other connections (note: not "timeout", but exactly -
> slowdown from 1ms to 300ms)?
>
> What could be the problem of all these things? Such slowdown typically
> happens 2-3 times per day, no direct correlation with checkpoints, disks
> loads or something else. It just starts to work slow even on simplest
> queries, and then it restores normal operations.

Hm. Maybe application is leaking transaction and locks along with it?
Using any connection pooling -- in particular, one managed on client
side? One of the first things to check in slow down period is
pg_locks view: both in terms of raw # of granted locks and any
ungranted ones.

merlin

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2012-11-20 20:35:25 Re: get column name passed to a function
Previous Message Merlin Moncure 2012-11-20 20:00:59 Re: High SYS CPU - need advise