pgsql: Fix race condition that lead to WALInsertLock deadlock with comm

From: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Fix race condition that lead to WALInsertLock deadlock with comm
Date: 2015-08-02 17:11:53
Message-ID: E1ZLwnh-0005j6-QP@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Fix race condition that lead to WALInsertLock deadlock with commit_delay.

If a call to WaitForXLogInsertionsToFinish() returned a value in the middle
of a page, and another backend then started to insert a record to the same
page, and then you called WaitXLogInsertionsToFinish() again, the second
call might return a smaller value than the first call. The problem was in
GetXLogBuffer(), which always updated the insertingAt value to the
beginning of the requested page, not the actual requested location. Because
of that, the second call might return a xlog pointer to the beginning of
the page, while the first one returned a later position on the same page.
XLogFlush() performs two calls to WaitXLogInsertionsToFinish() in
succession, and holds WALWriteLock on the second call, which can deadlock
if the second call to WaitXLogInsertionsToFinish() blocks.

Reported by Spiros Ioannou. Backpatch to 9.4, where the more scalable
WALInsertLock mechanism, and this bug, was introduced.

Branch
------
REL9_4_STABLE

Details
-------
http://git.postgresql.org/pg/commitdiff/bab959906911c97437f410a03b0346e6dd28d528

Modified Files
--------------
src/backend/access/transam/xlog.c | 27 ++++++++++++++++++++++++---
1 file changed, 24 insertions(+), 3 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2015-08-02 18:55:22 pgsql: Fix incorrect order of lock file removal and failure to close()
Previous Message Andres Freund 2015-08-02 16:51:18 pgsql: Micro optimize LWLockAttemptLock() a bit.