Re: swarm of processes in BIND state?

From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: swarm of processes in BIND state?
Date: 2016-05-28 18:32:02
Message-ID: 20160528183202.GC14097@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sat, May 28, 2016 at 10:32:15AM -0700, Jeff Janes wrote:
> > any clues on where to start diagnosing it?
>
> I'd start by using strace (with -y -ttt -T) on one of the processes
> and see what it is doing. A lot of IO, and one what file? A lot of
> semop's?

So, I did:

sudo strace -o bad.log -y -ttt -T -p $( ps uwwf -u postgres | grep BIND | awk '{print $2}' | head -n1 )
and killed it after 10 seconds, more or less. Results:

$ wc -l bad.log
6075 bad.log
$ grep -c semop bad.log
6018

The rest were reads, seeks, and single open to these files:
$ grep -v semop bad.log | grep -oE '/16421/[0-9.]*' | sort | uniq -c
2 /16421/3062403236.20
2 /16421/3062403236.8
25 /16421/3222944583.49
28 /16421/3251043620.60

Which are:
$ select oid::regclass from pg_class where relfilenode in (3062403236, 3222944583, 3251043620);
oid
----------------------------------
app_schema.s_table
app_schema.v_table
app_schema.m_table
(3 rows)

which are 3 largest tables there are. But, logs dont show any queries
that would touch all 3 of them.

> If that wasn't informative, I'd attach to one of the processes with
> the gdb debugger and get a backtrace. (You might want to do that a
> few times, just in case the first one accidentally caught the code
> during a part of its execution which was not in the bottlenecked
> spot.)

I did:
for a in $( ps uww -U postgres | grep BIND | awk '{print $2}' ); do echo "bt" | gdb -p $a > $a.bt.log 2>&1; done

Since there is lots of output, I made a tarball with it, and put it on
https://depesz.com/various/all.bt.logs.tar.gz

The file is ~ 19kB.

> > So far we've:
> > 1. ruled out IO problems (enough io both in terms of bandwidth and iops)
>
> Are you saying that you are empirically not actually doing any IO
> waits, or just that the IO capacity is theoretically sufficient?

there are no iowaits per what iostat returns. Or, there are but very low.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message hubert depesz lubaczewski 2016-05-28 18:33:47 Re: swarm of processes in BIND state?
Previous Message Tom Lane 2016-05-28 18:15:07 Re: swarm of processes in BIND state?