How to investiage slow insert problem

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Rural Hunter <ruralhunter(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: How to investiage slow insert problem
Date: 2013-08-21 00:24:08
Message-ID: CAMkU=1xThJQiqJ3Ue=x=Sx4Vknh0Uu45Y85-jt4kutELFiqzTA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Aug 19, 2013 at 10:30 PM, Rural Hunter
<ruralhunter(at)gmail(dot)com<javascript:;>>
wrote:
> 于 2013/8/20 12:34, Jeff Janes 写道:
>

> > How long had they been hanging there? It makes a big difference whether
> > there are several hanging there at one moment, but a few milliseconds
later
> > there are several different ones, versus the same few that hang around
of
> > many seconds or minutes at a time.
>
> The hanging connections never disappear until I restart pgbouncer. It's
like
> this, At minute 1, 3 connections left. At minute 2, another 3 left, total
6.
> Another minute, another 3 left, total 9....till the limit reaches.

OK, that certainly does sound like network problems and not disk
contention. But what I don't see is why it would be listed as "active" in
pg_stat_activity. If it is blocked on a network connection, I would think
it would show 'idle'.

>
> > If it is the same few processes hung for long periods, I would strace
them,
> > or gdb them and get a backtrace.
>
> any detail guide to use strace/gdb on pg process?

strace and gdb are aggressive forms of monitoring, so you should talk to
your sys admins before using them (if your organization has that type of
segregation of duties) or read their documentation.

It is best to run them on a test system if you can get the problem to occur
there. If you can't, then I would be willing to run them against a hung
backend on a production system, but I might be more adventurous than most.

I have occasionally seen strace cause the straced program to seg-fault, but
I've only seen this with GUI programs and never with postgresql.

In theory attaching gdb could make a backend pause while holding a
spinlock, and if left that way for long enough could cause a
postgresql-wide panic over a stuck spinlock. But I've never seen this
happen without intentionally causing. If the backend you attach is truly
hung and hasn't already causes a panic, then it almost surely can't be
happen, but just to be sure you should quit gdb as soon as possible so that
the backend can continue unimpeded.

Anyway, both are fairly easy to run once you find the pid of one of a stuck
backend (either using ps, or using pg_stat_activity). Then you give the
pid to the debugging program with -p option.

You probably have to run as the postgres user, or you won't have
permissions to attach to the backend.

With strace, once you attach you will see a stream of system calls go to
your screen, until you hit ctrl-C to detach. But if the backend is hung on
a network connection, there should really only be one system call, and then
it just wait until you detach or the network connection times out, like
this:

$ strace -p 21116
Process 21116 attached - interrupt to quit
recvfrom(9,

So it is waiting on a recvfrom call, and that call never returns until I
got sick of waiting and hit ctrl-C. Not very interesting, but it does show
it is indeed stuck on the network

for gdb, it is similar to invoke:

$ gdb -p 21116

and it then produces several screenfuls of diagnostic gibberish and gives
you an interactive command-line environment. Once attached, you want to
get a backtrace ("bt", return), and then quit promptly ("q", return, "y").

That produces something like this:

Loaded symbols for /lib64/libnss_files.so.2
0x00000032a80e9672 in __libc_recv (fd=<value optimized out>, buf=0xb33f60,
n=8192, flags=0) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:30
30 return INLINE_SYSCALL (recvfrom, 6, fd, buf, n, flags, NULL,
NULL);
(gdb) bt
#0 0x00000032a80e9672 in __libc_recv (fd=<value optimized out>,
buf=0xb33f60, n=8192, flags=0) at
../sysdeps/unix/sysv/linux/x86_64/recv.c:30
#1 0x00000000005a4846 in secure_read (port=0x22f1190, ptr=0xb33f60,
len=8192) at be-secure.c:304
#2 0x00000000005ae33b in pq_recvbuf () at pqcomm.c:824
#3 0x00000000005ae73b in pq_getbyte () at pqcomm.c:865
#4 0x0000000000651c11 in SocketBackend (argc=<value optimized out>,
argv=<value optimized out>, dbname=0x22d2a28 "jjanes", username=<value
optimized out>)
at postgres.c:342
#5 ReadCommand (argc=<value optimized out>, argv=<value optimized out>,
dbname=0x22d2a28 "jjanes", username=<value optimized out>) at postgres.c:490
#6 PostgresMain (argc=<value optimized out>, argv=<value optimized out>,
dbname=0x22d2a28 "jjanes", username=<value optimized out>) at
postgres.c:3923
#7 0x000000000060e861 in BackendRun () at postmaster.c:3614
#8 BackendStartup () at postmaster.c:3304
#9 ServerLoop () at postmaster.c:1367
#10 0x00000000006113b1 in PostmasterMain (argc=<value optimized out>,
argv=<value optimized out>) at postmaster.c:1127
#11 0x00000000005b0440 in main (argc=3, argv=0x22d1cb0) at main.c:199
(gdb) q
A debugging session is active.

Inferior 1 [process 21116] will be detached.

Quit anyway? (y or n) y
Detaching from program: /usr/local/pgsql9_2/bin/postgres, process 21116

Cheers,

Jeff

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Rural Hunter 2013-08-21 01:33:14 Re: How to investiage slow insert problem
Previous Message Tom Lane 2013-08-20 23:32:09 Re: queries with DISTINCT / GROUP BY giving different plans