Deadlock in XLogInsert at AIX

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Deadlock in XLogInsert at AIX
Date: 2017-01-24 14:47:27
Message-ID: a72cfcb0-37d0-de2f-b3ec-f38ad8d6a8cc@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Hackers,

We are running Postgres at AIX and encoountered two strqange problems:
active zombies process and deadlock in XLOG writer.
First problem I will explain in separate mail, now I am mostly
concerning about deadlock.
It is irregularly reproduced with standard pgbench launched with 100
connections.

It sometimes happens with 9.6 stable version of Postgres but only when
it is compiled with xlc compiler.
We failed to reproduce the problem with GCC. So it looks like as bug in
compiler or xlc-specific atomics implementation...
But there are few moments which contradicts with this hypothesis:

1. The problem is reproduce with Postgres built without optimization.
Usually compiler bugs affect only optimized code.
2. Disabling atomics doesn't help.
3. Without optimization and with LOCK_DEBUG defined time of reproducing
the problem significantly increased. With optimized code it is almost
always reproduced in few minutes.
With debug version it usually takes much more time.

But the most confusing thing is stack trace:

(dbx) where
semop(??, ??, ??) at 0x9000000001f5790
PGSemaphoreLock(sema = 0x0a00000044b95928), line 387 in "pg_sema.c"
unnamed block in LWLockWaitForVar(lock = 0x0a0000000000d980, valptr =
0x0a0000000000d9a8, oldval = 102067874256, newval = 0x0fffffffffff9c10),
line 1666 in "lwlock.c"
LWLockWaitForVar(lock = 0x0a0000000000d980, valptr = 0x0a0000000000d9a8,
oldval = 102067874256, newval = 0x0fffffffffff9c10), line 1666 in "lwlock.c"
unnamed block in WaitXLogInsertionsToFinish(upto = 102067874328), line
1583 in "xlog.c"
WaitXLogInsertionsToFinish(upto = 102067874328), line 1583 in "xlog.c"
AdvanceXLInsertBuffer(upto = 102067874256, opportunistic = '\0'), line
1916 in "xlog.c"
unnamed block in GetXLogBuffer(ptr = 102067874256), line 1697 in "xlog.c"
GetXLogBuffer(ptr = 102067874256), line 1697 in "xlog.c"
CopyXLogRecordToWAL(write_len = 70, isLogSwitch = '\0', rdata =
0x000000011007ce10, StartPos = 102067874256, EndPos = 102067874328),
line 1279 in "xlog.c"
XLogInsertRecord(rdata = 0x000000011007ce10, fpw_lsn = 102067718328),
line 1011 in "xlog.c"
unnamed block in XLogInsert(rmid = '\n', info = '@'), line 453 in
"xloginsert.c"
XLogInsert(rmid = '\n', info = '@'), line 453 in "xloginsert.c"
log_heap_update(reln = 0x0000000110273540, oldbuf = 40544, newbuf =
40544, oldtup = 0x0fffffffffffa2a0, newtup = 0x00000001102bb958,
old_key_tuple = (nil), all_visible_cleared = '\0',
new_all_visible_cleared = '\0'), line 7708 in "heapam.c"
unnamed block in heap_update(relation = 0x0000000110273540, otid =
0x0fffffffffffa6f8, newtup = 0x00000001102bb958, cid = 1, crosscheck =
(nil), wait = '^A', hufd = 0x0fffffffffffa5b0, lockmode =
0x0fffffffffffa5c8), line 4212 in "heapam.c"
heap_update(relation = 0x0000000110273540, otid = 0x0fffffffffffa6f8,
newtup = 0x00000001102bb958, cid = 1, crosscheck = (nil), wait = '^A',
hufd = 0x0fffffffffffa5b0, lockmode = 0x0fffffffffffa5c8), line 4212 in
"heapam.c"
unnamed block in ExecUpdate(tupleid = 0x0fffffffffffa6f8, oldtuple =
(nil), slot = 0x00000001102bb308, planSlot = 0x00000001102b4630,
epqstate = 0x00000001102b2cd8, estate = 0x00000001102b29e0, canSetTag =
'^A'), line 937 in "nodeModifyTable.c"
ExecUpdate(tupleid = 0x0fffffffffffa6f8, oldtuple = (nil), slot =
0x00000001102bb308, planSlot = 0x00000001102b4630, epqstate =
0x00000001102b2cd8, estate = 0x00000001102b29e0, canSetTag = '^A'), line
937 in "nodeModifyTable.c"
ExecModifyTable(node = 0x00000001102b2c30), line 1516 in "nodeModifyTable.c"
ExecProcNode(node = 0x00000001102b2c30), line 396 in "execProcnode.c"
ExecutePlan(estate = 0x00000001102b29e0, planstate = 0x00000001102b2c30,
use_parallel_mode = '\0', operation = CMD_UPDATE, sendTuples = '\0',
numberTuples = 0, direction = ForwardScanDirection, dest =
0x00000001102b7520), line 1569 in "execMain.c"
standard_ExecutorRun(queryDesc = 0x00000001102b25c0, direction =
ForwardScanDirection, count = 0), line 338 in "execMain.c"
ExecutorRun(queryDesc = 0x00000001102b25c0, direction =
ForwardScanDirection, count = 0), line 286 in "execMain.c"
ProcessQuery(plan = 0x00000001102b1510, sourceText = "UPDATE
pgbench_tellers SET tbalance = tbalance + 4019 WHERE tid = 6409;",
params = (nil), dest = 0x00000001102b7520, completionTag = ""), line 187
in "pquery.c"
unnamed block in PortalRunMulti(portal = 0x0000000110115e20, isTopLevel
= '^A', setHoldSnapshot = '\0', dest = 0x00000001102b7520, altdest =
0x00000001102b7520, completionTag = ""), line 1303 in "pquery.c"
unnamed block in PortalRunMulti(portal = 0x0000000110115e20, isTopLevel
= '^A', setHoldSnapshot = '\0', dest = 0x00000001102b7520, altdest =
0x00000001102b7520, completionTag = ""), line 1303 in "pquery.c"
PortalRunMulti(portal = 0x0000000110115e20, isTopLevel = '^A',
setHoldSnapshot = '\0', dest = 0x00000001102b7520, altdest =
0x00000001102b7520, completionTag = ""), line 1303 in "pquery.c"
unnamed block in PortalRun(portal = 0x0000000110115e20, count =
9223372036854775807, isTopLevel = '^A', dest = 0x00000001102b7520,
altdest = 0x00000001102b7520, completionTag = ""), line 815 in "pquery.c"
PortalRun(portal = 0x0000000110115e20, count = 9223372036854775807,
isTopLevel = '^A', dest = 0x00000001102b7520, altdest =
0x00000001102b7520, completionTag = ""), line 815 in "pquery.c"
unnamed block in exec_simple_query(query_string = "UPDATE
pgbench_tellers SET tbalance = tbalance + 4019 WHERE tid = 6409;"), line
1094 in "postgres.c"
exec_simple_query(query_string = "UPDATE pgbench_tellers SET tbalance =
tbalance + 4019 WHERE tid = 6409;"), line 1094 in "postgres.c"
unnamed block in PostgresMain(argc = 1, argv = 0x0000000110119f68,
dbname = "postgres", username = "postgres"), line 4076 in "postgres.c"
PostgresMain(argc = 1, argv = 0x0000000110119f68, dbname = "postgres",
username = "postgres"), line 4076 in "postgres.c"
BackendRun(port = 0x0000000110114290), line 4279 in "postmaster.c"
BackendStartup(port = 0x0000000110114290), line 3953 in "postmaster.c"
unnamed block in ServerLoop(), line 1701 in "postmaster.c"
unnamed block in ServerLoop(), line 1701 in "postmaster.c"
unnamed block in ServerLoop(), line 1701 in "postmaster.c"
ServerLoop(), line 1701 in "postmaster.c"
PostmasterMain(argc = 3, argv = 0x00000001100c6190), line 1309 in
"postmaster.c"
main(argc = 3, argv = 0x00000001100c6190), line 228 in "main.c"

As I already mentioned, we built Postgres with LOCK_DEBUG , so we can
inspect lock owner. Backend is waiting for itself!
Now please look at two frames in this stack trace marked with red.
XLogInsertRecord is setting WALInsert locks at the beginning of the
function:

if (isLogSwitch)
WALInsertLockAcquireExclusive();
else
WALInsertLockAcquire();

WALInsertLockAcquire just selects random item from WALInsertLocks array
and exclusively locks:

if (lockToTry == -1)
lockToTry = MyProc->pgprocno % NUM_XLOGINSERT_LOCKS;
MyLockNo = lockToTry;
immed = LWLockAcquire(&WALInsertLocks[MyLockNo].l.lock, LW_EXCLUSIVE);

Then, following the stack trace, AdvanceXLInsertBuffer calls
WaitXLogInsertionsToFinish:

/*
* Now that we have an up-to-date LogwrtResult value, see if we
* still need to write it or if someone else already did.
*/
if (LogwrtResult.Write < OldPageRqstPtr)
{
/*
* Must acquire write lock. Release WALBufMappingLock
first,
* to make sure that all insertions that we need to
wait for
* can finish (up to this same position). Otherwise we risk
* deadlock.
*/
LWLockRelease(WALBufMappingLock);

WaitXLogInsertionsToFinish(OldPageRqstPtr);

LWLockAcquire(WALWriteLock, LW_EXCLUSIVE);

It releases WALBufMappingLock but not WAL insert locks!
Finally in WaitXLogInsertionsToFinish tries to wait for all locks:

for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
{
XLogRecPtr insertingat = InvalidXLogRecPtr;

do
{
/*
* See if this insertion is in progress. LWLockWait will
wait for
* the lock to be released, or for the 'value' to be set by a
* LWLockUpdateVar call. When a lock is initially
acquired, its
* value is 0 (InvalidXLogRecPtr), which means that we
don't know
* where it's inserting yet. We will have to wait for it. If
* it's a small insertion, the record will most likely fit
on the
* same page and the inserter will release the lock without
ever
* calling LWLockUpdateVar. But if it has to sleep, it will
* advertise the insertion point with LWLockUpdateVar before
* sleeping.
*/
if (LWLockWaitForVar(&WALInsertLocks[i].l.lock,
&WALInsertLocks[i].l.insertingAt,
insertingat, &insertingat))

And here we stuck!
The comment to WaitXLogInsertionsToFinish says:

* Note: When you are about to write out WAL, you must call this function
* *before* acquiring WALWriteLock, to avoid deadlocks. This function might
* need to wait for an insertion to finish (or at least advance to next
* uninitialized page), and the inserter might need to evict an old WAL
buffer
* to make room for a new one, which in turn requires WALWriteLock.

Which contradicts to the observed stack trace.

I wonder if it is really synchronization bug in xlog.c or there is
something wrong in this stack trace and it can not happen in case of
normal work?

Thanks in advance,

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aaron W. Swenson 2017-01-24 14:51:47 Re: PostgreSQL 8.2 installation error on Windows 2016 server
Previous Message Robert Haas 2017-01-24 14:39:10 Re: [PATCH] Change missleading comment for tm_mon field of pg_tm structure