于 2013/8/20 12:34, Jeff Janes 写道:
On Monday, August 19, 2013, Rural Hunter wrote:
I think that this should generally not happen at the server
if you are using pgbouncer, as you should configure it so that
pgbouncer has a lower limit than postgresql itself does. What
pooling method (session, transaction, statement) are you using?
statement. Currently, I set the limit of pgbouncer connection to
same as db connection. But I also have a few connections connecting
to db server directly.
Can you provide some example numbers for the io load?
I get some when the connection limit is reached(The database related
storage is on sdb/sdd/sde/sdf):
root@ubtserver:~# iostat -xm 3
Linux 3.5.0-22-generic (ubuntu) 2013年08月19日 _x86_64_ (32
CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
14.71 0.00 2.86 0.48 0.00 81.96
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.26 0.04 0.36 0.00 0.00
24.71 0.00 0.55 3.01 0.30 0.29 0.01
sdb 0.00 0.26 0.18 2.32 0.02 0.38
329.50 0.01 5.36 1.26 5.69 0.21 0.05
sdc 0.01 4.59 10.13 45.75 0.30 0.92
44.65 0.05 5.14 7.49 4.62 0.63 3.50
dm-0 0.00 0.00 0.00 0.01 0.00
0.00 8.00 0.00 6.37 6.38 6.36 3.62 0.00
sdd 0.00 0.42 0.02 42.87 0.00 0.46
22.12 0.03 0.78 14.09 0.77 0.49 2.10
sde 0.00 3.68 10.23 156.41 0.19 1.45
20.06 0.03 1.59 21.34 0.29 0.51 8.55
sdf 0.00 2.56 6.29 66.00 0.29 0.71
28.42 0.04 0.56 4.52 0.19 0.37 2.71
avg-cpu: %user %nice %system %iowait %steal %idle
13.99 0.00 1.91 1.04 0.00 83.06
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.33 0.00 0.00 0.00
16.00 0.00 4.00 4.00 0.00 4.00 0.13
sdb 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 15.33 5.33 14.33 0.13 0.21
34.98 0.03 1.63 6.00 0.00 1.02 2.00
dm-0 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 31.33 0.00 0.26
17.19 0.01 0.34 0.00 0.34 0.34 1.07
sde 0.00 0.00 43.00 163.67 0.59 1.29
18.55 2.56 21.34 72.06 8.01 1.69 34.93
sdf 0.00 0.00 6.00 62.00 0.17 0.55
21.88 0.49 7.16 5.56 7.31 0.27 1.87
avg-cpu: %user %nice %system %iowait %steal %idle
15.84 0.00 2.63 1.70 0.00 79.83
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 1.67 0.00 2.00 0.00 0.01
14.67 0.07 33.33 0.00 33.33 25.33 5.07
sdb 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 4.67 0.00 0.06 0.00
26.29 0.13 6.29 6.29 0.00 25.14 11.73
dm-0 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.33 0.00 49.00 0.00 0.39
16.49 0.02 0.35 0.00 0.35 0.35 1.73
sde 0.00 11.00 30.67 81.33 0.38 0.71
19.98 36.46 143.19 43.91 180.62 2.69 30.13
sdf 0.00 9.33 3.00 326.00 0.09 2.75
17.69 3.51 10.66 5.33 10.71 0.11 3.60
avg-cpu: %user %nice %system %iowait %steal %idle
14.99 0.00 2.39 4.89 0.00 77.74
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 19.67 7.33 29.00 0.09 0.60
38.61 1.18 35.41 175.45 0.00 15.93 57.87
dm-0 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.33 0.00 39.33 0.00 0.31
15.93 0.01 0.37 0.00 0.37 0.37 1.47
sde 0.00 11.33 29.67 312.67 0.39 2.51
17.34 87.15 314.23 108.13 333.78 2.84 97.20
sdf 0.00 0.00 8.33 0.00 0.17 0.00
42.24 0.05 6.56 6.56 0.00 2.40 2.00
avg-cpu: %user %nice %system %iowait %steal %idle
14.98 0.00 2.23 5.45 0.00 77.34
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.67 0.00 0.01
20.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 9.67 10.00 6.00 0.12 0.10
27.83 0.08 5.08 8.13 0.00 1.42 2.27
dm-0 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 44.33 0.00 0.35
16.00 0.03 0.72 0.00 0.72 0.72 3.20
sde 0.00 0.00 47.33 0.00 0.58 0.00
25.18 5.26 111.04 111.04 0.00 19.10 90.40
sdf 0.00 11.00 3.33 683.33 0.12 7.38
22.37 12.05 17.54 244.00 16.44 0.49 33.33
avg-cpu: %user %nice %system %iowait %steal %idle
15.21 0.00 2.54 0.56 0.00 81.69
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 2.00 0.00 1.00 0.00 0.01
24.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 14.33 2.00 0.20 0.39
73.80 0.07 4.08 4.65 0.00 2.37 3.87
dm-0 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.33 0.00 62.00 0.00 0.52
17.08 0.02 0.34 0.00 0.34 0.34 2.13
sde 0.00 9.67 30.67 157.33 0.43 1.27
18.54 1.75 9.33 15.91 8.04 1.09 20.53
sdf 0.00 9.67 6.67 0.67 0.13 0.04
46.91 0.04 5.09 5.60 0.00 2.36 1.73
avg-cpu: %user %nice %system %iowait %steal %idle
14.72 0.00 1.95 0.58 0.00 82.76
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 2.00 0.00
0.01 8.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 13.67 5.33 32.33 0.07 0.31
20.46 0.04 1.03 7.25 0.00 0.46 1.73
dm-0 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 42.00 0.00 0.35
17.27 0.03 0.79 0.00 0.79 0.79 3.33
sde 0.00 0.33 48.00 804.00 0.61 6.34
16.71 8.38 9.82 14.11 9.57 0.23 19.20
sdf 0.00 0.00 8.00 463.00 0.09 4.12
18.30 5.00 10.62 7.17 10.68 0.11 5.20
Could you post the complete log message and a few lines of
context around it?
There is no context from the same connection around that message.
How long had they been hanging there? It makes a big
difference whether there are several hanging there at one
moment, but a few milliseconds later there are several different
ones, versus the same few that hang around of many seconds or
minutes at a time.
The hanging connections never disappear until I restart pgbouncer.
It's like this, At minute 1, 3 connections left. At minute 2,
another 3 left, total 6. Another minute, another 3 left, total
9....till the limit reaches.
If the identities of the "hung" processes are rapidly
changing, it could just be that you are hitting a throughput
limit. When you do a lot of inserts into indexed the tables, the
performance can drop precipitously once the size of the actively
updated part of the indexes exceeds shared_buffers. This would
usually show up in the io stats, but if you always have a lot of
io going on, it might not be obvious.
If it is the same few processes hung for long periods, I
would strace them, or gdb them and get a backtrace.
any detail guide to use strace/gdb on pg process?
Sorry, I don't know what a connection snapshot in db2 looks
like.
http://pic.dhe.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=%2Fcom.ibm.db2.luw.admin.cmd.doc%2Fdoc%2Fr0001945.html
search for "get snapshot for application". Note: some items in
the sample are marked as "
Not Collected" because some
monitor flags are turned off.
Cheers,
Jeff