Re: Connections hang indefinitely while taking a LWLockTranche buffer_content lock.

From: Chris Hanks <christopher(dot)m(dot)hanks(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Connections hang indefinitely while taking a LWLockTranche buffer_content lock.
Date: 2017-10-23 16:38:27
Message-ID: CAK7KUdBPYO7G=dMaU-uJQM=tGkF6yPVwEf9L7s7ak_TfZ1=B0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi -

So it seems that Heroku support can't currently attach to the lxc
container running the 9.6.5 DB. They were able to get a stacktrace
from another DB experiencing the same issue that's running 9.6.1 (I
had started up the 9.6.5 instance hoping that bumping the minor
version would fix this). This instance has two stuck processes.
pg_stat_activity looks like:

```
Query:

SELECT pid, wait_event_type, wait_event, state, backend_xid,
backend_xmin, backend_start, xact_start, query_start, state_change,
query FROM pg_stat_activity ORDER BY xact_start

Results:

22;"LWLockTranche";"buffer_content";"active";11091;11091;"2017-10-17
21:03:22.975601+00";"2017-10-17 21:03:23.001133+00";"2017-10-17
21:03:23.001253+00";"2017-10-17 21:03:23.001255+00";"INSERT INTO
public.que_lockers ( pid, worker_count, worker_priorities, ruby_pid,
ruby_hostname, listening, queues ) VALUES ( pg_backend_pid(),
$1::integer, $2::integer[], $3::integer, $4::text, $5::boolean,
$6::text[] )"
16987;"LWLockTranche";"buffer_content";"active";11092;11091;"2017-10-18
14:31:45.116391+00";"2017-10-18 14:31:45.133445+00";"2017-10-18
14:31:45.133585+00";"2017-10-18 14:31:45.133586+00";"INSERT INTO
public.que_lockers ( pid, worker_count, worker_priorities, ruby_pid,
ruby_hostname, listening, queues ) VALUES ( pg_backend_pid(),
$1::integer, $2::integer[], $3::integer, $4::text, $5::boolean,
$6::text[] )"
5502;"";"";"active";;11091;"2017-10-23 16:31:10.541227+00";"2017-10-23
16:31:15.909758+00";"2017-10-23 16:31:15.909758+00";"2017-10-23
16:31:15.909761+00";"SELECT pid, wait_event_type, wait_event, state,
backend_xid, backend_xmin, backend_start, xact_start, query_start,
state_change, query FROM pg_stat_activity ORDER BY xact_start"
20;"";"";"";;;"";"";"";"";"<insufficient privilege>"
5501;"";"";"idle";;;"2017-10-23 16:31:08.627896+00";"";"2017-10-23
16:31:09.089391+00";"2017-10-23 16:31:09.090431+00";"SELECT 1 FROM
pg_available_extensions WHERE name='adminpack'"
```

The currently held locks:
```
Query

SELECT db.datname, locktype, relation::regclass, mode, transactionid
AS tid, virtualtransaction AS vtid, pid, granted
FROM pg_catalog.pg_locks l
LEFT JOIN pg_catalog.pg_database db ON db.oid = l.database
WHERE NOT pid = pg_backend_pid();

Results

datname | locktype | relation |
mode | tid | vtid | pid | granted
----------------+---------------+----------------------------------+------------------+-------+-------+-------+---------
dfkjlo7osvu4dh | relation | que_lockers_listening_queues_idx |
RowExclusiveLock | | 8/642 | 16987 | t
dfkjlo7osvu4dh | relation | que_lockers_pkey |
RowExclusiveLock | | 8/642 | 16987 | t
dfkjlo7osvu4dh | relation | que_lockers |
RowExclusiveLock | | 8/642 | 16987 | t
| virtualxid | |
ExclusiveLock | | 8/642 | 16987 | t
dfkjlo7osvu4dh | relation | que_lockers_listening_queues_idx |
RowExclusiveLock | | 5/10 | 22 | t
dfkjlo7osvu4dh | relation | que_lockers_pkey |
RowExclusiveLock | | 5/10 | 22 | t
dfkjlo7osvu4dh | relation | que_lockers |
RowExclusiveLock | | 5/10 | 22 | t
| virtualxid | |
ExclusiveLock | | 5/10 | 22 | t
| transactionid | |
ExclusiveLock | 11092 | 8/642 | 16987 | t
| transactionid | |
ExclusiveLock | 11091 | 5/10 | 22 | t
(10 rows)
```

The stacktraces are:

```
Backtrace of query with PID 22:

#0 0x00007fa23544ed87 in write_call_graph (fd=<optimized out>) at gmon.c:287
#1 write_gmon () at gmon.c:394
#2 0x00005639548bf72b in LWLockAcquire (lock=0xffff0001,
mode=LW_EXCLUSIVE) at
/build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/storage/lmgr/lwlock.c:1287
#3 0x00005639546707c1 in ginHeapTupleFastInsert
(ginstate=ginstate(at)entry=0x7ffef442af00,
collector=collector(at)entry=0x7ffef442aee0)
at /build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/access/gin/ginfast.c:314
#4 0x0000563954663985 in gininsert (index=<optimized out>,
values=0x7ffef442d590, isnull=0x7ffef442d690 "",
ht_ctid=0x5639555dfb64, heapRel=<optimized out>,
checkUnique=<optimized out>)
at /build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/access/gin/gininsert.c:512
#5 0x00005639547b202c in ExecInsertIndexTuples
(slot=slot(at)entry=0x5639555dc3e0, tupleid=tupleid(at)entry=0x5639555dfb64,
estate=estate(at)entry=0x5639555dbb40, noDupErr=noDupErr(at)entry=0 '\000',
specConflict=specConflict(at)entry=0x0,
arbiterIndexes=arbiterIndexes(at)entry=0x0) at
/build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/executor/execIndexing.c:388
#6 0x00005639547d0ffd in ExecInsert (canSetTag=1 '\001',
estate=0x5639555dbb40, onconflict=<optimized out>, arbiterIndexes=0x0,
planSlot=0x5639555dc3e0, slot=0x5639555dc3e0,
mtstate=0x5639555dbde0) at
/build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/executor/nodeModifyTable.c:496
#7 ExecModifyTable (node=node(at)entry=0x5639555dbde0) at
/build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/executor/nodeModifyTable.c:1511
#8 0x00005639547b6c38 in ExecProcNode
(node=node(at)entry=0x5639555dbde0) at
/build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/executor/execProcnode.c:396
#9 0x00005639547b2cfe in ExecutePlan (dest=0x563954e0a300
<donothingDR>, direction=<optimized out>, numberTuples=0, sendTuples=0
'\000', operation=CMD_INSERT,
use_parallel_mode=<optimized out>, planstate=0x5639555dbde0,
estate=0x5639555dbb40) at
/build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/executor/execMain.c:1567
#10 standard_ExecutorRun (queryDesc=0x5639555cf6a0,
direction=<optimized out>, count=0) at
/build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/executor/execMain.c:338
#11 0x00007fa22a3b1035 in pgss_ExecutorRun (queryDesc=0x5639555cf6a0,
direction=ForwardScanDirection, count=0)
at /build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../contrib/pg_stat_statements/pg_stat_statements.c:877
#12 0x00005639548d0549 in ProcessQuery (plan=<optimized out>,
sourceText=0x5639555c6ca0 "INSERT INTO public.que_lockers ( pid,
worker_count, worker_priorities, ruby_pid, ruby_hostname, listening,
queues ) VALUES ( pg_backend_pid(), $1::integer, $2::integer[],
$3::integer, $4::text, $5::bo"..., params=0x5639555c6db0,
dest=0x563954e0a300 <donothingDR>, completionTag=0x7ffef442dee0 "")
at /build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/tcop/pquery.c:187
#13 0x00005639548d079e in PortalRunMulti
(portal=portal(at)entry=0x5639555c0b90, isTopLevel=isTopLevel(at)entry=1
'\001', setHoldSnapshot=setHoldSnapshot(at)entry=0 '\000',
dest=0x563954e0a300 <donothingDR>, dest(at)entry=0x5639555a5310,
altdest=0x563954e0a300 <donothingDR>, altdest(at)entry=0x5639555a5310,
completionTag=completionTag(at)entry=0x7ffef442dee0 "")
at /build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/tcop/pquery.c:1303
#14 0x00005639548d1397 in PortalRun
(portal=portal(at)entry=0x5639555c0b90,
count=count(at)entry=9223372036854775807, isTopLevel=isTopLevel(at)entry=1
'\001', dest=dest(at)entry=0x5639555a5310,
altdest=altdest(at)entry=0x5639555a5310,
completionTag=completionTag(at)entry=0x7ffef442dee0 "") at
/build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/tcop/pquery.c:815
#15 0x00005639548ce643 in exec_execute_message
(max_rows=9223372036854775807, portal_name=0x5639555a4f00 "")
at /build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/tcop/postgres.c:1977
#16 PostgresMain (argc=<optimized out>,
argv=argv(at)entry=0x563955527678, dbname=0x5639555275a0
"dfkjlo7osvu4dh", username=<optimized out>)
at /build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/tcop/postgres.c:4133
#17 0x00005639546543fb in BackendRun (port=0x56395551dcb0) at
/build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/postmaster/postmaster.c:4270
#18 BackendStartup (port=0x56395551dcb0) at
/build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/postmaster/postmaster.c:3944
#19 ServerLoop () at
/build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/postmaster/postmaster.c:1701
#20 0x000056395486e06b in PostmasterMain (argc=5, argv=<optimized
out>) at /build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/postmaster/postmaster.c:1309
#21 0x0000563954655142 in main (argc=5, argv=0x5639554d53a0) at
/build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/main/main.c:228

Backtrace of query with PID 16987

#0 0x00007fa23544ed87 in write_call_graph (fd=<optimized out>) at gmon.c:287
#1 write_gmon () at gmon.c:394
#2 0x00005639548bf72b in LWLockAcquire (lock=0xffff000f,
mode=LW_EXCLUSIVE) at
/build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/storage/lmgr/lwlock.c:1287
#3 0x00005639546704e0 in ginHeapTupleFastInsert
(ginstate=ginstate(at)entry=0x7ffef442af00,
collector=collector(at)entry=0x7ffef442aee0)
at /build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/access/gin/ginfast.c:255
#4 0x0000563954663985 in gininsert (index=<optimized out>,
values=0x7ffef442d590, isnull=0x7ffef442d690 "",
ht_ctid=0x5639555df7e4, heapRel=<optimized out>,
checkUnique=<optimized out>)
at /build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/access/gin/gininsert.c:512
#5 0x00005639547b202c in ExecInsertIndexTuples
(slot=slot(at)entry=0x5639555dc060, tupleid=tupleid(at)entry=0x5639555df7e4,
estate=estate(at)entry=0x5639555db7c0, noDupErr=noDupErr(at)entry=0 '\000',
specConflict=specConflict(at)entry=0x0,
arbiterIndexes=arbiterIndexes(at)entry=0x0) at
/build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/executor/execIndexing.c:388
#6 0x00005639547d0ffd in ExecInsert (canSetTag=1 '\001',
estate=0x5639555db7c0, onconflict=<optimized out>, arbiterIndexes=0x0,
planSlot=0x5639555dc060, slot=0x5639555dc060,
mtstate=0x5639555dba60) at
/build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/executor/nodeModifyTable.c:496
#7 ExecModifyTable (node=node(at)entry=0x5639555dba60) at
/build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/executor/nodeModifyTable.c:1511
#8 0x00005639547b6c38 in ExecProcNode
(node=node(at)entry=0x5639555dba60) at
/build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/executor/execProcnode.c:396
#9 0x00005639547b2cfe in ExecutePlan (dest=0x563954e0a300
<donothingDR>, direction=<optimized out>, numberTuples=0, sendTuples=0
'\000', operation=CMD_INSERT,
use_parallel_mode=<optimized out>, planstate=0x5639555dba60,
estate=0x5639555db7c0) at
/build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/executor/execMain.c:1567
#10 standard_ExecutorRun (queryDesc=0x5639555cf320,
direction=<optimized out>, count=0) at
/build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/executor/execMain.c:338
#11 0x00007fa22a3b1035 in pgss_ExecutorRun (queryDesc=0x5639555cf320,
direction=ForwardScanDirection, count=0)
at /build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../contrib/pg_stat_statements/pg_stat_statements.c:877
#12 0x00005639548d0549 in ProcessQuery (plan=<optimized out>,
sourceText=0x5639555c6920 "INSERT INTO public.que_lockers ( pid,
worker_count, worker_priorities, ruby_pid, ruby_hostname, listening,
queues ) VALUES ( pg_backend_pid(), $1::integer, $2::integer[],
$3::integer, $4::text, $5::bo"..., params=0x5639555c6a30,
dest=0x563954e0a300 <donothingDR>, completionTag=0x7ffef442dee0 "")
at /build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/tcop/pquery.c:187
#13 0x00005639548d079e in PortalRunMulti
(portal=portal(at)entry=0x5639555c0810, isTopLevel=isTopLevel(at)entry=1
'\001', setHoldSnapshot=setHoldSnapshot(at)entry=0 '\000',
dest=0x563954e0a300 <donothingDR>, dest(at)entry=0x5639555a4f90,
altdest=0x563954e0a300 <donothingDR>, altdest(at)entry=0x5639555a4f90,
completionTag=completionTag(at)entry=0x7ffef442dee0 "")
at /build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/tcop/pquery.c:1303
#14 0x00005639548d1397 in PortalRun
(portal=portal(at)entry=0x5639555c0810,
count=count(at)entry=9223372036854775807, isTopLevel=isTopLevel(at)entry=1
'\001', dest=dest(at)entry=0x5639555a4f90,
altdest=altdest(at)entry=0x5639555a4f90,
completionTag=completionTag(at)entry=0x7ffef442dee0 "") at
/build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/tcop/pquery.c:815
#15 0x00005639548ce643 in exec_execute_message
(max_rows=9223372036854775807, portal_name=0x5639555a4b80 "")
at /build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/tcop/postgres.c:1977
#16 PostgresMain (argc=<optimized out>,
argv=argv(at)entry=0x563955527678, dbname=0x5639555275a0
"dfkjlo7osvu4dh", username=<optimized out>)
at /build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/tcop/postgres.c:4133
#17 0x00005639546543fb in BackendRun (port=0x56395551dcb0) at
/build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/postmaster/postmaster.c:4270
#18 BackendStartup (port=0x56395551dcb0) at
/build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/postmaster/postmaster.c:3944
#19 ServerLoop () at
/build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/postmaster/postmaster.c:1701
#20 0x000056395486e06b in PostmasterMain (argc=5, argv=<optimized
out>) at /build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/postmaster/postmaster.c:1309
#21 0x0000563954655142 in main (argc=5, argv=0x5639554d53a0) at
/build/postgresql-9.6-HEXeEe/postgresql-9.6-9.6.1/build/../src/backend/main/main.c:228
```

Heroku support isn't sure how to attach to the 9.6.5 DB without
restarting the container, which will mean losing the stuck processes.
We can certainly try to do that and wait for the issue to recur,
though, if we think the 9.6.1 -> 9.6.5 difference is relevant.

Thanks!
Chris

On Sun, Oct 22, 2017 at 3:23 PM, Chris Hanks
<christopher(dot)m(dot)hanks(at)gmail(dot)com> wrote:
> Alright, thanks, I'll ask them to look into that first thing tomorrow
> morning. They may not get back to me for a day or so, so if anyone has
> anything else they'd like me to ask for in the same request, please
> let me know!
>
> Chris
>
> On Sun, Oct 22, 2017 at 2:39 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Chris Hanks <christopher(dot)m(dot)hanks(at)gmail(dot)com> writes:
>>> I'm encountering a locking issue in Postgres. I unfortunately haven't
>>> been able to make a reproduction for it, but it's occurred a few times
>>> on different machines, and I have access to two machines that are
>>> currently locked up, so I'm hoping that someone can help anyway.
>>
>> If you could get stack traces from the stuck backends, that would be
>> pretty helpful.
>>
>> regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2017-10-23 16:49:06 Re: Connections hang indefinitely while taking a LWLockTranche buffer_content lock.
Previous Message Tom Lane 2017-10-23 16:33:52 Re: BUG #14867: Cascade drop type error