From: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
---|---|
To: | katsumata(dot)tomonari(at)po(dot)ntts(dot)co(dot)jp |
Cc: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: BUG #8434: Why does dead lock occur many times ? |
Date: | 2013-10-04 17:22:17 |
Message-ID: | 20131004172217.GW5235@eldon.alvh.no-ip.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
I spent a lot of time looking at this issue. Your problem scenario can
be reduced to the following test case:
CREATE TABLE t (a INT);
INSERT INTO t VALUES (1);
The three sessions need to execute the following commands, in this
sequence:
-- Session 1
/* 1 */ BEGIN;
SELECT * FROM t FOR UPDATE;
-- Session 2
/* 2 */ BEGIN;
UPDATE t SET a = 2; -- blocks waiting on S1
-- Session 3
/* 3 */ BEGIN;
UPDATE t SET a = 3; -- blocks
-- this is waiting on S1, but since S2 is also waiting on S1 and will acquire a
-- conflicting lock first, this session will only be released after S2 releases
-- its own lock.
Session 1
/* 4 */ UPDATE t SET a = 4;
COMMIT; -- releases session 2
BEGIN;
SELECT * FROM t FOR UPDATE; -- blocks waiting on S2 (*)
Session 2:
/* 5 */ COMMIT; -- releases session 1
BEGIN;
UPDATE t SET a = 5; -- blocks waiting on S3
Session 1:
/* 6 */ UPDATE t SET a = 6; -- blocks waiting on S3
At this point, all sessions are blocked on their UPDATE commands, and
deadlock is reported. (For debugging, it's useful to set a very large
deadlock_timeout and examine pg_locks, etc.)
The point at which things started to go wrong was where the session 1's
SELECT FOR UPDATE with the (*) was allowed to continue after session 2
commits. In 9.2 that session remains blocked, and instead we release
session 3.
The problem is the handling of following of the update chain during a
lock command; the question is should sessions be blocked waiting for
locks on future versions of the row? They currently don't block, which
is what leads the SELECT FOR UPDATE to continue; but if we make them
block, other things break. I initially toyed with the following patch,
which removes the deadlock the above report and also your original test
case:
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -4832,15 +4832,12 @@ l4:
xmax = HeapTupleHeaderGetRawXmax(mytup.t_data);
/*
- * If this tuple is updated and the key has been modified (or
- * deleted), what we do depends on the status of the updating
- * transaction: if it's live, we sleep until it finishes; if it has
- * committed, we have to fail (i.e. return HeapTupleUpdated); if it
- * aborted, we ignore it. For updates that didn't touch the key, we
- * can just plough ahead.
+ * If this tuple is updated, what we do depends on the status of the
+ * updating transaction: if it's live, we sleep until it finishes; if
+ * it has committed, we have to fail (i.e. return HeapTupleUpdated); if
+ * it aborted, we ignore it.
*/
- if (!(old_infomask & HEAP_XMAX_INVALID) &&
- (mytup.t_data->t_infomask2 & HEAP_KEYS_UPDATED))
+ if (!(old_infomask & HEAP_XMAX_INVALID))
{
TransactionId update_xid;
However, the differences in behavior this causes (visible by running the
isolation tests) don't look good to me. It's quite possible that there
are other bugs elsewhere.
Sadly, I have to look at other things now and I won't have time to
research this further until after next week's minor releases; so
whatever we decide to do, if anything, will not be in 9.3.1.
Thanks for the report.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Stefan Kaltenbrunner | 2013-10-04 20:20:46 | Re: BUG #8469: Xpath behaviour unintuitive / arguably wrong |
Previous Message | kyrimis | 2013-10-04 13:26:11 | BUG #8500: Upgrade to postgis 2.1 breaks existing databases |