Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Date: 2013-09-21 22:40:42
Message-ID: CAM3SWZTe2BjohvWmD5fAV1+3G-CQO4TzvBoSLwK0qG4wio_fwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Stephen,

On Fri, Sep 20, 2013 at 6:55 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> I'm not sure I follow this completely- you're saying that a definition
> of 'upsert' which includes having to lock rows which aren't in your
> current snapshot (for reasons stated) isn't a useful one. Is the
> implication that a useful definition of 'upsert' is that it *doesn't*
> have to lock rows which aren't in your current snapshot, and if so, then
> what would the semantics of that upsert look like?

No, I'm suggesting that the useful semantics are that it does
potentially lock rows not yet visible to our snapshot that have
committed - the latest row version. I see no alternative (we can't
throw a serialization failure at read committed isolation level), and
Andres seemed to agree that this was the way forward. Robert described
problems he saw with this a few years ago [1]. It *is* a problem (we
need to think very carefully about it), but, as I've said, it is a
problem that anyone implementing this feature for a Snapshot
Isolation/MVCC database would have to deal with, and several have.

So, what the patch does right now is (if you squint) analogous to how
SELECT FOR UPDATE uses EvalPlanQual already. However, instead of
re-verifying a qual, we're re-verifying that the value locking has
identified the right tid (there will probably be a different one in
the subsequent iteration, or maybe we *can* go insert this time). We
need consensus across unique indexes to go ahead with insertion, but
once we know that we can't (and have a tid to lock), value locks can
go away - we'll know if anything has changed about the tid's logical
row that we need to care about when row locking. Besides, holding
value locks while row locking has deadlock hazards, and, because value
locks only stop insertions *finishing*, holding on to them is at best
pointless.

The tid we get from locking, that points to a would-be duplicate heap
tuple has always committed - otherwise we'd never return from locking,
because that blocks pending the outcome of a duplicate-inserting-xact
(and only returns the tid when that xact commits). Even though this
tuple is known to be visible, it may be deleted in the interim before
row locking, in which case restarting from before value locking is
appropriate. It might also be updated, which would necessitate locking
a later row version in order to prevent race conditions. But it seems
dangerous, invasive, and maybe even generally impossible to try and
wait for the transaction that updated to commit or abort so that we
can lock that later version the usual way (the usual EvalPlanQual
looping thing) - better to restart value locking.

The fundamental observation about value locking (at least for any
half-way reasonable implementation), that I'd like to emphasize, is
that short of a radical overhaul that would have many downsides, it
can only ever prevent insertion from *finishing*. The big picture of
my design is that it tries to quickly grab value locks, release them
and grab a row lock (or insert heap tuples, index tuples, and then
release value locks). If row locking fails, it waits for the
conflicter xact to finish, and then restarts before the value locking
of the current slot. If you think that's kind of questionable, maybe
you have a point, but consider:

1. How else are you going to handle it if row locking needs to handle
conflicts? You might say "I can re-verify that no unique index columns
were affected instead", and maybe you can, but what if that doesn't
help because they *were* changed? Besides, doesn't this break the
amcanunique contract? Surely judging what's really a duplicate is the
AM's job.

You're back to "I need to throw an error to get out of this but I have
no good excuse to do so at read committed" -- you've lost the usual
duplicate key error "excuse". I don't think you can expect holding the
value locks throughout row locking to help, because, as I've said,
that causes morally indefensible deadlocks, and besides, it doesn't
stop what row locking would consider to be a conflict, it just stops
insertion from *finishing*.

2. In the existing btree index insertion code, the order that retries
occur in the event of unique index tuple insertion finding an
unfinished conflicting xact *is* undefined. Yes, that's right - the
first waiter is not guaranteed to be the first to get a second chance.
It's not even particularly probable! See remarks from my last mail to
Robert for more information.

3. People with a real purist's view on the (re)ordering of value
locking must already think that EvalPlanQual() is completely ghetto
for very similar reasons, and as such should just go use a higher
isolation level. For the rest of us, what concurrency control anomaly
can allowing this cause over and above what's already possible there?
Are lock starvation risks actually appreciably raised at all?

What Andres and Robert seem to expect generally - that value locks
only be released when we the locker has a definitive answer - actually
*can* be ensured at the higher isolation levels, where the system has
license to bail out by throwing a serialization failure. The trick
there is just to throw an error if the first *retry* at cross-index
value locking is unsuccessful or blocks on a whole other xact -- a
serialization error (and not a unique constraint violation error, as
would often but not necessarily otherwise occur for non-upserters).
Naturally, it could also happen that at > read committed, row locking
throws a serialization failure (as is always mandated over using
EvalPlanQual() or other monkeying around at higher isolation levels).

> This I am generally in agreement with, to the extent that 'upsert' is
> something we really want and we should figure out a way to get there
> from here, but it wouldn't be the first time that we worked out a
> better solution than existing implementations. So, another '+1' from me
> wrt your working this issue and please don't get too discouraged that
> there's a lot of pressure to find a magic bullet

Thanks for the encouragement!

[1] http://www.postgresql.org/message-id/AANLkTineR-rDFWENeddLg=GrkT+epMHk2j9X0YqpiTY8@mail.gmail.com
--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2013-09-21 22:48:19 Re: VMs for Reviewers Available
Previous Message Josh Berkus 2013-09-21 22:35:41 Re: [HACKERS] VMs for Reviewers Available