Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Date: 2013-12-25 23:27:36
Message-ID: CAM3SWZRftFEoWYLca37zGQh49uiJrFU96v3Wj2t5RzXBzuJWhQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 25, 2013 at 6:25 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> So? I still have the fear that you approach will end up being way too
> complicated and full of layering violations. I didn't say it's a no-go
> (not that I have veto powers, even if I'd consider it one).

Apart from not resulting in unprincipled deadlocking, it respects the
AM abstraction more than all other approaches outlined. Inserting
tuples as value locks just isn't a great approach, even if you ignore
the fact you must come up with a whole new way to release your "value
locks" without ending your xact.

> And yes, I still think that promise tuples might be a better solution
> regardless of the issues you mentioned, but you know what? That doesn't
> matter. Me thinking it's the better approach is primarily based on gut
> feeling, and I clearly haven't voiced clear enough reasons to convince
> you. So you going with your own, possibly more substantiated, gut
> feeling is perfectly alright. Unless I go ahead and write a POC of my
> own at least ;)

My position is not based on a gut feeling. It is based on carefully
considering the interactions of the constituent parts, plus the
experience of actually building a working prototype.

> Whoa? What? Not convincing everyone is far from it being a useless
> discussion. Such an attitude sure is not the way to go to elicit more
> feedback.
> And it clearly gave you the feedback that most people regard holding
> buffer locks across other nontrivial operations, in a potentially
> unbounded number, as a fundamental problem.

Uh, I knew that it was a problem all along. While I explored ways of
ameliorating the problem, I specifically stated that we should discuss
the subsystems interactions/design, which you were far too quick to
dismiss. The overall design is far more pertinent than one specific
mechanism. While I certainly welcome your participation, if you want
to be an effective reviewer I suggest examining your own attitude.
Everyone wants this feature.

>> I don't see what is functionally insufficient about the value locking
>> that you've already seen.
>
> I still think it's fundamentally unacceptable to hold buffer locks
> across any additional complex operations. So yes, I think the current
> state is fundamentally insufficient.

I said *functionally* insufficient. Buffer locks demonstrably do a
perfectly fine job of value locking. Of course the current state is
insufficient, but I'm talking about design here.

>> Holding value locks for more than an instant doesn't make sense. The
>> reason is simple: when upserting, we're tacitly only really looking
>> for violations on one particular unique index. We just lock them all
>> at once because the implementation doesn't know *which* unique index.
>> So in actuality, it's really no different from existing
>> potential-violation handling for unique indexes, except we have to do
>> some extra work in addition to the usual restart from scratch stuff
>> (iff we have multiple unique indexes).
>
> I think the point here really is that that you assume that we're always
> only looking for conflicts with one unique index. If that's all we want
> to support - sure, only the keys in that index need to be locked.
> I don't think that's necessarily a given, especially when you just want
> to look at the conflict in detail, without using a subtransaction.

Why would I not assume that? It's perfectly obvious from the syntax
that you can't do much if you don't know ahead of time where the
conflict might be. It's just like the MySQL feature - the user had
better know where it might be. Now, at least with my syntax as a user
you have some capacity to recover if you consider ahead of time that
you might get it wrong. But clearly rejected, and not conflicting rows
are projected, and multiple conflicts per row are not accounted for.
We lock on the first conflict, which with idiomatic usage will be the
only possible conflict.

That isn't the only reason why value locks don't need to be held for
more than an instant. It's just the most obvious one.

Incidentally, there are many implementation reasons why "true value
locking", where value locks are held indefinitely is extremely
difficult. When I referred to an SLRU, I was just exploring the idea
of making value locks (still only held for an instant) more granular.
On closer examination it looks to me like premature optimization,
though.

>> You never stated a reason why you thought it was
>> necessary. If you have one now, please share it. Note that I release
>> all value locks before row locking too, which is necessary because to
>> do any less will cause unprincipled deadlocking, as we've seen.
>
> I can't sensibly comment upon that right now, I'd need to read more code
> to understand what you're doing there.

You could have looked at it back in September, if only you'd given
these interactions the up-front consideration that they warranted.
Nothing has changed there at all.

> Well, you haven't delivered that part yet, that's pretty much my point,
> no?
> I don't think you can easily do this by just additionally taking a new
> kind of heavyweight locks in the new codepaths - that will still allow
> deadlocks with the old codepaths taking only lwlocks. So this is a
> nontrivial sub-project which very well might influence whether the
> approach is deemed acceptable or not.

I have already written the code, and am in the process of cleaning it
up and gaining confidence that I haven't missed something. It's not
trivial, and there are some subtleties, but I think that your level of
skepticism around the difficulty of doing this is excessive.
Naturally, non-speculative insertion does have to care about the
heavyweight locks sometimes, but only when a page-level flag is found
to be set.

>> (I guess you might not like the visibility stuff or the
>> syntax, but that isn't what you're talking about here).
>
> I don't particularly care about that for now. I think we can find common
> ground, even if it will take some further emails. It probably isn't
> what's in there right now, but I don't think you'e intended it as such.

I certainly hope we can find common ground. I want to work with you.

>> I've been very consistent even in the face of strong criticism. What I
>> have now is essentially the same design as back in early September.
>
> Uh. And why's that necessarily a good thing?

It isn't necessarily, but you've taken my comments out of context. I
was addressing Robert, and his general point about there being
confusion around the semantics and locking protocol aspects. My point
was: if that general impression was created, it is almost entirely
because of discussion of other approaches. The fact that I've been
consistent on design aspects clearly indicates that no one has said
anything to make me reconsider my position. If that's just because
there hasn't been enough scrutiny of my design, then I can hardly be
blamed; I've been begging for that kind of scrutiny.

I have been the one casting doubt on other designs, and quite
successfully I might add. The fact that there was confusion about
those other approaches should not prejudice anyone against my
approach. That doesn't mean I'm right, of course, but as long as no
one is examining those aspects, and as long as no one appreciates what
considerations informed the design I came up with, we won't make
progress. Can we focus on the design, and how things fit together,
please?

> Minor details I noticed in passing:
> * Your tqual.c bit isn't correct, you're forgetting multixacts.

I knew that was broken, but I don't grok the tuple locking code.
Perhaps you can suggest a correct formulation.

> * You several mention "core" in comments as if this wouldn't be part of
> it, that seems confusing.

Well, the executor is naive of the underlying AM, even if it is btree.
What terminology do you suggest that captures that?

> * Doesn't ExecInsert() essentially busy-loop if there's a concurrent
> non-committed insert?

No, it does not. It frees earlier value locks, and waits on the other
xact in the usual manner, and then restarts from scratch. There is a
XactLockTableWait() call in _bt_lockinsert(), just like
_bt_doinsert(). It might be the case that waiters have to spin a few
times if there is lots of conflicts on the same row, but that's
similar to the current state of affairs in _bt_doinsert(). If there
were transactions that kept aborting you'd see the same thing today,
as when doing upserting with subtransactions with lots of conflicts.
It is true that there is nothing to arbitrate the ordering (i.e. there
is no LockTupleTuplock()/Locktuple() call in the btree code), but I
think that doesn't matter because that arbitration occurs when
something close to conventional row locking occurs in
nodeModifyTable.c (or else we just IGNORE).

If you think that there could be unpleasant interactions because of
the lack of Locktuple() arbitration within _bt_lockinsert() or
something like that, please provide a test-case.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2013-12-26 09:32:06 [PATCH] Regression tests in windows ignore white space
Previous Message Tom Lane 2013-12-25 16:49:56 Re: [PATCH] Negative Transition Aggregate Functions (WIP)