From: | Peter Geoghegan <pg(at)heroku(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE |
Date: | 2013-12-20 04:06:29 |
Message-ID: | CAM3SWZShbE29KpoD44cVc3vpZJGmDer6k_6FGHiSzeOZGmTFSQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Dec 18, 2013 at 8:39 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> Empirically, retrying because ExecInsertIndexTuples() returns some
> recheckIndexes occurs infrequently, so maybe that makes all of this
> okay. Or maybe it happens infrequently *because* we don't give up on
> insertion when it looks like the current iteration is futile. Maybe
> just inserting into every unique index, and then blocking on an xid
> within ExecCheckIndexConstraints(), works out fairly and performs
> reasonably in all common cases. It's pretty damn subtle, though, and I
> worry about the worst case performance, and basic correctness issues
> for these reasons.
I realized that it's possible to create the problem that I'd
previously predicted with "promise tuples" [1] some time ago, that are
similar in some regards to what Heikki has here. At the time, Robert
seemed to agree that this was a concern [2].
I have a very simple testcase attached, much simpler that previous
testcases, that reproduces deadlock for the patch
exclusion_insert_on_dup.2013_12_12.patch.gz at scale=1 frequently, and
occasionally when scale=10 (for tiny, single-statement transactions).
With scale=100, I can't get it to deadlock on my laptop (60 clients in
all cases), at least in a reasonable time period. With the patch
btreelock_insert_on_dup.2013_12_12.patch.gz, it will never deadlock,
even with scale=1, simply because value locks are not held on to
across row locking. This is why I characterized the locking as
"opportunistic" on several occasions in months past.
The test-case is actually much simpler than the one I describe in [1],
and much simpler than all previous test-cases, as there is only one
unique index, though the problem is essentially the same. It is down
to old "value locks" held across retries - with "exclusion_...", we
can't *stop* locking things from previous locking attempts (where a
locking attempt is btree insertion with the UNIQUE_CHECK_PARTIAL
flag), because dirty snapshots still see
inserted-then-deleted-in-other-xact tuples. This deadlocking seems
unprincipled and unjustified, which is a concern that I had all along,
and a concern that Heikki seemed to share more recently [3]. This is
why I felt strongly all along that value locks ought to be cheap to
both acquire and _release_, and it's unfortunate that so much time was
wasted on tangential issues, though I do accept some responsibility
for that.
So, I'd like to request as much scrutiny as possible from as wide as
possible a cross section of contributors of this test case
specifically. This feature's development is deadlocked on resolving
this unprincipled deadlocking controversy. This is a relatively easy
thing to have an opinion on, and I'd like to hear as many as possible.
Is this deadlocking something we can live with? What is a reasonable
path forward? Perhaps I am being pedantic in considering unnecessary
deadlocking as ipso facto unacceptable (after all, MySQL lived with
this kind of problem for long enough, even if it has gotten better for
them recently), but there is a very real danger of painting ourselves
into a corner with these concurrency issues. I aim to have the
community understand ahead of time the exact set of
trade-offs/semantics implied by our chosen implementation, whatever
the outcome. That seems very important. I myself lean towards this
being a blocker for the "exclusion_" approach at least as presented.
Now, you might say to yourself "why should I assume that this isn't
just attributable to btree page buffer locks being coarser than other
approaches to value locking?". That's a reasonable point, and indeed
it's why I avoided lower scale values in prior, more complicated
test-cases, but that doesn't actually account for the problem
highlighted: In this test-case we do not hold buffer locks across
other buffer locks within a single backends (at least in any new way),
nor do we lock rows while holding buffer locks within a single
backend. Quite simply, the conventional btree value locking approach
doesn't attempt to lock 2 things within a backend at the same time,
and you need to do that to get a deadlock, so there are no deadlocks.
Importantly, the "btree_..." implementation can release value locks.
Thanks
P.S. In the interest of reproducibility, I attach new revisions of
each patch, even though there is no reason to believe that any changes
since the last revision posted are significant to the test-case. There
was some diff minimization, plus I incorporated some (but not all)
unrelated feedback from Tom. It wasn't immediately obvious, at least
to me, that "rejects" can be made to be an unreserved keyword, due to
shift/reduce conflicts, but I did document AM changes. Hopefully this
gives some indication of the essential nature or intent of my design
that we may work towards refining (depending on the outcome of
discussion here, of course).
P.P.S. Be careful not to fall afoul of the shift/reduce conflicts when
applying either patch on top of commit
1b4f7f93b4693858cb983af3cd557f6097dab67b. I'm working on a fix that
allows a clean rebase.
[3] http://www.postgresql.org/message-id/528B640F.50601@vmware.com
--
Peter Geoghegan
Attachment | Content-Type | Size |
---|---|---|
btreelock_insert_on_dup.2013_12_19.patch.gz | application/x-gzip | 38.2 KB |
exclusion_insert_on_dup.2013_12_19.patch.gz | application/x-gzip | 19.9 KB |
testcase.tar.gz | application/x-gzip | 448 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2013-12-20 04:12:31 | Re: [PATCH] Doc fix for VACUUM FREEZE |
Previous Message | Gavin Wahl | 2013-12-20 03:29:18 | Re: Proposed feature: Selective Foreign Keys |