于 2013/8/20 12:34, Jeff Janes 写道:

On Monday, August 19, 2013, Rural Hunter wrote:

I think that this should generally not happen at the server if you are using pgbouncer, as you should configure it so that pgbouncer has a lower limit than postgresql itself does. What pooling method (session, transaction, statement) are you using?

statement. Currently, I set the limit of pgbouncer connection to same as db connection. But I also have a few connections connecting to db server directly.

Can you provide some example numbers for the io load?

I get some when the connection limit is reached(The database related storage is on sdb/sdd/sde/sdf):
root@ubtserver:~# iostat -xm 3
Linux 3.5.0-22-generic (ubuntu)     2013年08月19日     _x86_64_    (32 CPU)

avg-cpu: %user   %nice %system %iowait %steal   %idle
          14.71    0.00    2.86    0.48    0.00   81.96

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await svctm %util
sda               0.00     0.26    0.04    0.36     0.00     0.00    24.71     0.00    0.55    3.01    0.30   0.29   0.01
sdb               0.00     0.26    0.18    2.32     0.02     0.38   329.50     0.01    5.36    1.26    5.69   0.21   0.05
sdc               0.01     4.59   10.13   45.75     0.30     0.92    44.65     0.05    5.14    7.49    4.62   0.63   3.50
dm-0              0.00     0.00    0.00    0.01     0.00     0.00     8.00     0.00    6.37    6.38    6.36   3.62   0.00
sdd               0.00     0.42    0.02   42.87     0.00     0.46    22.12     0.03    0.78   14.09    0.77   0.49   2.10
sde               0.00     3.68   10.23 156.41     0.19     1.45    20.06     0.03    1.59   21.34    0.29   0.51   8.55
sdf               0.00     2.56    6.29   66.00     0.29     0.71    28.42     0.04    0.56    4.52    0.19   0.37   2.71

avg-cpu: %user   %nice %system %iowait %steal   %idle
          13.99    0.00    1.91    1.04    0.00   83.06

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await svctm %util
sda               0.00     0.00    0.33    0.00     0.00     0.00    16.00     0.00    4.00    4.00    0.00   4.00   0.13
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00    15.33    5.33   14.33     0.13     0.21    34.98     0.03    1.63    6.00    0.00   1.02   2.00
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     0.00    0.00   31.33     0.00     0.26    17.19     0.01    0.34    0.00    0.34   0.34   1.07
sde               0.00     0.00   43.00 163.67     0.59     1.29    18.55     2.56   21.34   72.06    8.01   1.69 34.93
sdf               0.00     0.00    6.00   62.00     0.17     0.55    21.88     0.49    7.16    5.56    7.31   0.27   1.87

avg-cpu: %user   %nice %system %iowait %steal   %idle
          15.84    0.00    2.63    1.70    0.00   79.83

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await svctm %util
sda               0.00     1.67    0.00    2.00     0.00     0.01    14.67     0.07   33.33    0.00   33.33 25.33   5.07
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     0.00    4.67    0.00     0.06     0.00    26.29     0.13    6.29    6.29    0.00 25.14 11.73
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     0.33    0.00   49.00     0.00     0.39    16.49     0.02    0.35    0.00    0.35   0.35   1.73
sde               0.00    11.00   30.67   81.33     0.38     0.71    19.98    36.46 143.19   43.91 180.62   2.69 30.13
sdf               0.00     9.33    3.00 326.00     0.09     2.75    17.69     3.51   10.66    5.33   10.71   0.11   3.60

avg-cpu: %user   %nice %system %iowait %steal   %idle
          14.99    0.00    2.39    4.89    0.00   77.74

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await svctm %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00    19.67    7.33   29.00     0.09     0.60    38.61     1.18   35.41 175.45    0.00 15.93 57.87
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     0.33    0.00   39.33     0.00     0.31    15.93     0.01    0.37    0.00    0.37   0.37   1.47
sde               0.00    11.33   29.67 312.67     0.39     2.51    17.34    87.15 314.23 108.13 333.78   2.84 97.20
sdf               0.00     0.00    8.33    0.00     0.17     0.00    42.24     0.05    6.56    6.56    0.00   2.40   2.00

avg-cpu: %user   %nice %system %iowait %steal   %idle
          14.98    0.00    2.23    5.45    0.00   77.34

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await svctm %util
sda               0.00     0.00    0.00    0.67     0.00     0.01    20.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     9.67   10.00    6.00     0.12     0.10    27.83     0.08    5.08    8.13    0.00   1.42   2.27
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     0.00    0.00   44.33     0.00     0.35    16.00     0.03    0.72    0.00    0.72   0.72   3.20
sde               0.00     0.00   47.33    0.00     0.58     0.00    25.18     5.26 111.04 111.04    0.00 19.10 90.40
sdf               0.00    11.00    3.33 683.33     0.12     7.38    22.37    12.05   17.54 244.00   16.44   0.49 33.33

avg-cpu: %user   %nice %system %iowait %steal   %idle
          15.21    0.00    2.54    0.56    0.00   81.69

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await svctm %util
sda               0.00     2.00    0.00    1.00     0.00     0.01    24.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     0.00   14.33    2.00     0.20     0.39    73.80     0.07    4.08    4.65    0.00   2.37   3.87
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     0.33    0.00   62.00     0.00     0.52    17.08     0.02    0.34    0.00    0.34   0.34   2.13
sde               0.00     9.67   30.67 157.33     0.43     1.27    18.54     1.75    9.33   15.91    8.04   1.09 20.53
sdf               0.00     9.67    6.67    0.67     0.13     0.04    46.91     0.04    5.09    5.60    0.00   2.36   1.73

avg-cpu: %user   %nice %system %iowait %steal   %idle
          14.72    0.00    1.95    0.58    0.00   82.76

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await svctm %util
sda               0.00     0.00    0.00    2.00     0.00     0.01     8.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00    13.67    5.33   32.33     0.07     0.31    20.46     0.04    1.03    7.25    0.00   0.46   1.73
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     0.00    0.00   42.00     0.00     0.35    17.27     0.03    0.79    0.00    0.79   0.79   3.33
sde               0.00     0.33   48.00 804.00     0.61     6.34    16.71     8.38    9.82   14.11    9.57   0.23 19.20
sdf               0.00     0.00    8.00 463.00     0.09     4.12    18.30     5.00   10.62    7.17   10.68   0.11   5.20

Could you post the complete log message and a few lines of context around it?

There is no context from the same connection around that message.

How long had they been hanging there? It makes a big difference whether there are several hanging there at one moment, but a few milliseconds later there are several different ones, versus the same few that hang around of many seconds or minutes at a time.

The hanging connections never disappear until I restart pgbouncer. It's like this, At minute 1, 3 connections left. At minute 2, another 3 left, total 6. Another minute, another 3 left, total 9....till the limit reaches.

If the identities of the "hung" processes are rapidly changing, it could just be that you are hitting a throughput limit. When you do a lot of inserts into indexed the tables, the performance can drop precipitously once the size of the actively updated part of the indexes exceeds shared_buffers. This would usually show up in the io stats, but if you always have a lot of io going on, it might not be obvious.

If it is the same few processes hung for long periods, I would strace them, or gdb them and get a backtrace.

any detail guide to use strace/gdb on pg process?

Sorry, I don't know what a connection snapshot in db2 looks like.

http://pic.dhe.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=%2Fcom.ibm.db2.luw.admin.cmd.doc%2Fdoc%2Fr0001945.html
search for "

get snapshot for application". Note: some items in
      the sample are marked as "

Not Collected" because some
      monitor flags are turned off.

Cheers,

Jeff