Re: regression, deadlock in high frequency single-row UPDATE

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: Andrew Sackville-West <awest(at)janrain(dot)com>, pgsql-bugs(at)postgresql(dot)org, Paulo Tanimoto <paulo(at)janrain(dot)com>
Subject: Re: regression, deadlock in high frequency single-row UPDATE
Date: 2014-12-10 19:33:14
Message-ID: 20141210193314.GZ1768@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Mark Kirkwood wrote:

> Not so much advantages - just seeing if I could still reproduce the issue
> with a simpler test case i.e ensuring you were not doing anything 'odd' -
> and you are not :-)
>
> The next step would be trying to figure out what commit introduced this
> behaviour - but depesz has already beaten me to that (nice work)!

So I traced through the problem using the simplified test case I posted.
It goes like this:

* tuple on the referenced table has been updated a bunch of times
already, and keeps being updated, so there's a large update chain to
follow.

* two or more transactions (T1 and T2, say) have snapshots in which
some version of the tuple (not the last one) is visible.

* transaction T3 has a ForKeyShare lock on the latest version of the
tuple, which is grabbed because of the INSERT in the referencing
table. Note that these locks are propagated upwards in an update
chain, so even if T3 locked an old version, all future versions are
also locked by T3.

* each of T1 and T2 tries to update the version they have visible; this
fails because it's not the last version, so they need to go through
EvalPlanQual, which walks through the update chain.

At this point, T1 passes through heap_lock_tuple, gets the HW tuple
lock, marks itself as locker in the tuple's Xmax. T1 is scheduled out;
time for T2 to run. T2 goes through heap_lock_tuple, grabs HW lock;
examines Xmax, sees that T1 is the locker, goes to sleep until T1
awakes. T1 is scheduled to run again, runs heap_update. The first
thing it needs is to do LockTuple ... but T2 is holding that one, so T1
goes to sleep until T2 awakes -- now they are both sleeping on each
other.

Kaboom.

I'm not seeing a solution here. There is something wrong in having to
release the HW lock then grab it again, in that EvalPlanQual dance. It
sounds like we need to hold on to the tuple's HW lock until after
heap_update has run, but it's not at all clear how this would work --
the interface through EvalPlanQual is messy enough as it is without
entertaining the idea that a "please keep HW lock until later" flag
needs to be passed up and down all over the place.

One simple idea that occurs to me is have some global state in heapam.c;
when the EPQ code is invoked from ExecUpdate, we tell heapam to keep the
HW lock, and it's only released once heap_update is finished. That
would keep T2 away from the tuple. We'd need a PG_TRY block in
ExecUpdate to reset the global state once the update is done. (It's not
entirely clear that we need all this for deletes too. The main
difference is that after a delete completes the tuple is gone, which is
obviously not the case in an update. Also, deletes don't create update
chains.)

One thing I haven't thought too much about is why doesn't this happen in
9.2.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Jeff Frost 2014-12-10 21:38:13 Re: BUG #12187: Cant find the postgresql-client installation by yum install
Previous Message Alvaro Herrera 2014-12-10 18:45:48 Re: BUG #11986: psql uses pager inside Emacs shell buffer (not a terminal)