Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Date: 2013-09-27 12:36:21
Message-ID: CA+TgmoZ13xC4vxrhtpmr5S3hmewopbGuF3c_XG+J5eCj_Jvtjg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 26, 2013 at 11:58 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Thu, Sep 26, 2013 at 12:15 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> Well, I think we can rule out value locks that are held for the
>>> duration of a transaction right away. That's just not going to fly.
>>
>> I think I agree with that. I don't think I remember hearing that proposed.
>
> I think I might have been unclear - I mean locks that are held for the
> duration of *another* transaction, not our own, as we wait for that
> other transaction to commit/abort. I think that earlier remarks from
> yourself and Andres implied that this would be necessary. Perhaps I'm
> mistaken. Your most recent design proposal doesn't do this, but I
> think that that's only because it restricts the user to a single
> unique index - it would otherwise be necessary to sit on the earlier
> value locks (index tuples belonging to an unfinished transaction)
> pending the completion of some other conflicting transaction, which
> has numerous disadvantages (as described in my "it suits my purposes
> to have the value locks be held for only an instant" mail to you [1]).

OK, now I understand what you are saying. I don't think I agree with it.

> I don't have another idea either. In fact, I'd go so far as to say
> that doing any third thing that's better than those two to any
> reasonable person is obviously impossible. But I'd add that we simple
> cannot rollback at read committed, so we're just going to have to hold
> our collective noses and do strange things with visibility.

I don't accept that as a general principal. We're writing the code;
we can make it behave any way we think best.

> This is something that I haven't given remotely enough thought yet, so
> please take it with a big grain of salt.

I doubt that any change to HeapTupleSatisfiesMVCC() will be
acceptable. This feature needs to restrain itself to behavior changes
that only affect users of this feature, I think.

> There is certainly value in considering that, and you're right to take
> that tact - it is generally valuable to have a patch be minimally
> invasive. However, ultimately that's just one aspect of any given
> design, an aspect that needs to be weighed against others where there
> is a tension. Obviously in this instance I believe, rightly or
> wrongly, that doing more - adding more infrastructure than might be
> considered strictly necessary - is the least worst thing. Also,
> sometimes the apparent similarity of a design to what we have today is
> illusory - certainly, I think you'd at least agree that the problems
> that bloating during the interim between value locking and row locking
> present are qualitatively different to other problems that bloat
> presents in all existing scenarios.

TBH, no, I don't think I agree with that. See further below.

> Let me try and explain myself better, with reference to a concrete
> example. Suppose we have a table with a primary key column, A, and a
> unique constraint column, B, and we lock the pk value first and the
> unique constraint value second. I'm assuming your design, but allowing
> for multiple unique indexes because I don't think doing anything less
> will be accepted - promise tuples have some of the same problems, as
> well as some other idiosyncratic ones (see my earlier remarks on
> recovery/freezing [2] for examples of those).

OK, so far I'm right with you.

> So there is a fairly high probability that the pk value on A will be
> unique, and a fairly low probability that the unique constraint value
> on B will be unique, at least in this usage pattern of interest, where
> the user is mostly going to end up updating. Mostly, we insert a
> speculative regular index tuple (that points to a speculative heap
> tuple that we might decide to kill) into the pk column, A, right away,
> and then maybe block pending the resolution of a conflicting
> transaction on the unique constraint column B. I don't think we have
> any reasonable way of not blocking on A - if we go clean it up for the
> wait, that's going to bloat quite dramatically, *and* we have to WAL
> log. In any case you seemed to accept that cleaning up bloat
> synchronously like that was just going to be too expensive. So I
> suppose that rules that out. That just leaves sitting on the "value
> lock" (that the pk index tuple already inserted effectively is)
> indefinitely, pending the outcome of the first transaction.

Agreed.

> What are the consequences of sitting on that value lock indefinitely?
> Well, xacts are going to block on the pk value much more frequently,
> by simple virtue of the fact that the value locks there are held for a
> long time - they just needed to hear a "no" answer, which the unique
> constraint was in most cases happy to immediately give, so this is
> totally unnecessary. Contention is now in a certain sense almost as
> bad for every unique index as it is for the weakest link. That's only
> where the problems begin, though, and it isn't necessary for there to
> be bad contention on more than one unique index (the pk could just be
> on a serial column, say) to see bad effects.

Here's where I start to lose faith. It's unclear to me what those
other transactions are doing. If they're trying to insert a record
that conflicts with the primary key of the tuple we're inserting,
they're probably doomed, but not necessarily; we might roll back. If
they're also upserting, it's absolutely essential that they wait until
we get done before deciding what to do.

> So your long-running xact that's blocking all the other sessions on
> its proposed value for a (or maybe even b) - that finally gets to
> proceed. Regardless of whether it commits or aborts, there will be a
> big bloat race. This is because when the other sessions get the
> go-ahead to proceed, they'll all run to get the row lock (one guy
> might insert instead). Only one will be successful, but they'll all
> kill their heap tuple on the assumption that they'll probably lock the
> row, which is only true in the average case. Now, maybe you can teach
> them to not bother killing the heap tuple when there are no index
> tuples actually inserted to ruin things, but then maybe not, and maybe
> it wouldn't help in this instance if you did teach them (because
> there's a third, otherwise irrelevant constraint or whatever).

Supposing they are all upserters, it seems to me that what will
probably happen is that one of them will lock the row and update it,
and then commit. Then the next one will lock the row and update it,
and then commit. And so on. It's probably important to avoid having
them keep recreating speculative tuples and then killing them as long
as a candidate tuple is available, so that they don't create a dead
tuple per iteration. But that seems doable.

> Realize you can generally only kill the heap tuple *before* you have
> the row lock, because otherwise a totally innocent non-HOT update (may
> not update any unique indexed columns at all) will deadlock with your
> session, which I don't think is defensible, and will probably happen
> often if allowed to (after all, this is upsert - users are going to
> want to update their locked rows!).

I must be obtuse; I don't see why that would deadlock.

A bigger problem that I've just realized, though, is that once
somebody else has blocked on a unique index insertion, they'll be
stuck there until end of transaction even if we kill the tuple,
because they're waiting on the xid, not the index itself. That might
be fatal to my proposed design, or at least require the use of some
more clever locking regimen.

> Contrast this with my design, where re-ordering of would-be
> conflicters across unique indexes (or serialization failures) can
> totally nip this in the bud *if* the contention can be re-ordered
> around, but if not, at least there is no need to worry about
> aggregating bloat at all, because it creates no bloat.

Yeah, possibly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian Pflug 2013-09-27 12:46:50 Re: Wait free LW_SHARED acquisition
Previous Message Robert Haas 2013-09-27 11:12:21 Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls