From: | Peter Geoghegan <pg(at)heroku(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE |
Date: | 2013-09-28 00:36:45 |
Message-ID: | CAM3SWZRfrw+zXe7CKt6-QTCuvKQ-Oi7gnbBOPqQsvddU=9M7_g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Sep 27, 2013 at 5:36 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I don't have another idea either. In fact, I'd go so far as to say
>> that doing any third thing that's better than those two to any
>> reasonable person is obviously impossible. But I'd add that we simple
>> cannot rollback at read committed, so we're just going to have to hold
>> our collective noses and do strange things with visibility.
>
> I don't accept that as a general principal. We're writing the code;
> we can make it behave any way we think best.
I presume you're referring to the principle that we cannot throw
serialization failures at read committed. I'd suggest that letting
that happen would upset a lot of people, because it's so totally
unprecedented. A large segment of our user base would just consider
that to be Postgres randomly throwing errors, and would be totally
dismissive of the need to do so, and not without some justification -
no one else does the same. The reality is that the majority of our
users don't even know what an isolation level is. I'm not just talking
about people that use Postgres more casually, such as Heroku
customers. I've personally talked to people who didn't even know what
a transaction isolation level was, that were in a position where they
really, really should have known.
> I doubt that any change to HeapTupleSatisfiesMVCC() will be
> acceptable. This feature needs to restrain itself to behavior changes
> that only affect users of this feature, I think.
I agree with the principle of what you're saying, but I'm not aware
that those changes to HeapTupleSatisfiesMVCC() imply any behavioral
changes for those not using the feature. Certainly, the standard
regression tests and isolation tests still pass, for what it's worth.
Having said that, I have not thought about it enough to be willing to
actually defend that bit of code. Though I must admit that I am a
little encouraged by the fact that it passes casual inspection.
I am starting to wonder if it's really necessary to have a "blessed"
update that can see the locked, not-otherwise-visible tuple. Doing
that certainly has its disadvantages, both in terms of code complexity
and in terms of being arbitrarily restrictive. We're going to have to
allow the user to see the locked row after it's updated (the new row
version that we create will naturally be visible to its creating xact)
- is it really any worse that the user can see it before an update (or
a delete)? The user could decide to effectively make the update change
nothing, and see the same thing anyway.
I get why you're averse to doing odd things to visibility - I was too.
I just don't see that we have a choice if we want this feature to work
acceptably with read committed. In addition, as it happens I just
don't see that the general situation is made any worse by the fact
that the user might be able to see the row before an update/delete.
Isn't is also weird to update or delete something you cannot see?
Couldn't EvalPlanQual() be said to be an MVCC violation on similar
grounds? It also "reaches into the future". Locking a row isn't really
that distinct from updating it in terms of the code footprint, but
also from a logical perspective.
> It's probably important to avoid having
> them keep recreating speculative tuples and then killing them as long
> as a candidate tuple is available, so that they don't create a dead
> tuple per iteration. But that seems doable.
I'm not so sure.
>> Realize you can generally only kill the heap tuple *before* you have
>> the row lock, because otherwise a totally innocent non-HOT update (may
>> not update any unique indexed columns at all) will deadlock with your
>> session, which I don't think is defensible, and will probably happen
>> often if allowed to (after all, this is upsert - users are going to
>> want to update their locked rows!).
>
> I must be obtuse; I don't see why that would deadlock.
If you don't see it, then you aren't being obtuse in asking for
clarification. It's really easy to be wrong about this kind of thing.
If the non-HOT update updates some random row, changing the key
columns, it will lock that random row version. It will then proceed
with "value locking" (i.e. inserting index tuples in the usual way, in
this case with entirely new values). It might then block on one of the
index tuples we, the upserter, have already inserted (these are our
"value locks" under your scheme). Meanwhile, we (the upserter) might
have *already* concluded that the *old* heap row that the regular
updater is in the process of rendering invisible is to blame in
respect of some other value in some later unique index, and that *it*
must be locked. Deadlock. This seems very possible if the key values
are somewhat correlated, which is probably generally quite common.
The important observation here is that an updater, in effect, locks
both the old and new sets of values (for non-HOT updates). And as I've
already noted, any practical "value locking" implementation isn't
going to be able to prevent the update from immediately locking the
old, because that doesn't touch an index. Hence, there is an
irresolvable disconnect between value and row locking.
Are we comfortable with this? Before you answer, consider that there
was lots of bugs (their words) in the MySQL implementation of this
same basic idea surrounding excessive deadlocking - I heard through
the grapevine that they fixed a number of bugs along these lines, and
that their implementation has historically had lots of deadlocking
problems.
I think that the way to deal with weird, unprincipled deadlocking is
to simply not hold value locks at the same time as row locks - it is
my contention that the lock starvation hazards that avoiding being
smarter about this may present aren't actually an issue, unless you
have some kind of highly implausible perfect storm of read-committed
aborters inserting around the same values - only one of those needs to
commit to remedy the situation - the first "no" answer is all we need
to give up.
To repeat myself, that's really the essential nature of my design: it
is accepting of the inevitability of there being a disconnect between
value and row locking. Value locks that are implemented in a sane way
can do very little; they can only prevent a conflicting insertion from
*finishing*, and not from causing a conflict for row locking.
> A bigger problem that I've just realized, though, is that once
> somebody else has blocked on a unique index insertion, they'll be
> stuck there until end of transaction even if we kill the tuple,
> because they're waiting on the xid, not the index itself. That might
> be fatal to my proposed design, or at least require the use of some
> more clever locking regimen.
Well, it's really fatal to your proposed design *because* it implies
that others will be blocked on earlier value locks, which is what I
was trying to say (in saying this, I'm continuing to hold your design
to the same standard as my own, which is that it must work across
multiple unique indexes - I believe that you yourself accept this
standard based on your remarks here).
For the benefit of others who may not get what we're talking about: in
my patch, that isn't a problem, because when we block on acquiring an
xid ShareLock pending value conflict resolution, that means that the
other guy actually did insert (and did not merely think about it), and
so with that design it's entirely appropriate that we wait for his
xact to end.
>> Contrast this with my design, where re-ordering of would-be
>> conflicters across unique indexes (or serialization failures) can
>> totally nip this in the bud *if* the contention can be re-ordered
>> around, but if not, at least there is no need to worry about
>> aggregating bloat at all, because it creates no bloat.
>
> Yeah, possibly.
I think that re-ordering is an important property of any design where
we cannot bail out with serialization failures. I know it seems weird,
because it seems like an MVCC violation to have our behavior altered
as a result of a transaction that committed that isn't even visible to
us. As I think you appreciate, on a certain level that's just the
nature of the beast. This might sound stupid, but: you can say the
same thing about unique constraint violations! I do not believe that
this introduces any anomalies that read committed doesn't already
permit according to the standard.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2013-09-28 00:39:09 | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE - visibility semantics |
Previous Message | Kevin Grittner | 2013-09-27 22:34:20 | Re: record identical operator - Review |