Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Date: 2014-01-10 15:09:33
Message-ID: 52D00D2D.6030307@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/10/2014 05:36 AM, Peter Geoghegan wrote:
> While I realize that everyone is busy, I'm concerned about the lack of
> discussing here. It's been 6 full days since I posted my benchmark,
> which I expected to quickly clear some things up, or at least garner
> interest, and yet no one has commented here since.

Nah, that's nothing. I have a patch in the January commitfest that was
already posted for the previous commitfest. It received zero review back
then, and still has no reviewer signed up, let alone anyone actually
reviewing it. And arguably it's a bug fix!

http://www.postgresql.org/message-id/5285071B.1040100@vmware.com

Wink wink, if you're looking for patches to review... ;-)

> The alternative exclusion* patch still deadlocks in an unprincipled
> fashion, when simple, idiomatic usage encounters contention. Heikki
> intends to produce a revision that fixes the problem, though having
> considered it carefully myself, I don't know what mechanism he has in
> mind, and frankly I'm skeptical.

Here's an updated patch. Hope it works now... This is based on an older
version, and doesn't include any fixes from your latest
btreelock_insert_on_dup.v7.2014_01_07.patch. Please check the common
parts, and copy over any relevant changes.

The fix for the deadlocking issue consists of a few parts. First,
there's a new heavy-weight lock type, a speculative insertion lock,
which works somewhat similarly to XactLockTableWait(), but is only held
for the duration of a single speculative insertion. When a backend is
about to begin a speculative insertion, it first acquires the
speculative insertion lock. When it's done with the insertion, meaning
it has either cancelled it by killing the already-inserted tuple or
decided that it's going to go ahead with it, the lock is released.

The speculative insertion lock is keyed by Xid, and token. The lock can
be taken many times in the same transaction, and token's purpose is to
distinguish which insertion is currently in progress. The token is
simply a backend-local counter, incremented each time the lock is taken.

In addition to the heavy-weight lock, there are new fields in PGPROC to
indicate which tuple the backend is currently inserting. When the tuple
is inserted, the backend fills in the relation's relfilenode and item
pointer in MyProc->specInsert* fields, while still holding the buffer
lock. The current speculative insertion token is also stored there.

With that mechanism, when another backend sees a tuple whose xmin is
still in progress, it can check if the insertion is a speculative
insertion. To do that, scan the proc array, and find the backend with
the given xid. Then, check that the relfilenode and itempointer in that
backend's PGPROC slot match the tuple, and make note of the token the
backend had advertised.

HeapTupleSatisfiesDirty() does the proc array check, and returns the
token in the snapshot, alongside snapshot->xmin. The caller can then use
that information in place of XactLockTableWait().

There would be other ways to skin the cat, but this seemed like the
quickest to implement. One more straightforward approach would be to use
the tuple's TID directly in the speculative insertion lock's key,
instead of Xid+token, but then the inserter would have to grab the
heavy-weight lock while holding the buffer lock, which seems dangerous.
Another alternative would be to store token in the heap tuple header,
instead of PGPROC; a tuple that's still being speculatively inserted has
no xmax, so it could be placed in that field. Or ctid.

> More importantly, I have to question
> whether we should continue to pursue that alternative approach, giving
> what we now know about its performance characteristics.

Yes.

> It could be
> improved, but not by terribly much, particularly for the case where
> there is plenty of update contention, which was shown in [1] to be
> approximately 2-3 times slower than extended page locking (*and* it's
> already looking for would-be duplicates*first*). I'm trying to be as
> fair as possible, and yet the difference is huge.

*shrug*. I'm not too concerned about performance during contention. But
let's see how this fixed version performs. Could you repeat the tests
you did with this?

Any guesses what the bottleneck is? At a quick glance at a profile of a
pgbench run with this patch, I didn't see anything out of ordinary, so
I'm guessing it's lock contention somewhere.

- Heikki

Attachment Content-Type Size
speculative-insertions-2014_01_10.patch.gz application/x-gzip 24.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-01-10 15:12:31 Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Previous Message Tom Lane 2014-01-10 14:49:35 Re: [PATCH] Negative Transition Aggregate Functions (WIP)