From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)heroku(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE |
Date: | 2013-11-18 14:44:01 |
Message-ID: | 528A27B1.1050204@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 14.10.2013 07:12, Peter Geoghegan wrote:
> On Wed, Oct 9, 2013 at 1:11 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>> Unfortunately, I have a very busy schedule in the month ahead,
>> including travelling to Ireland and Japan, so I don't think I'm going
>> to get the opportunity to work on this too much. I'll try and produce
>> a V4 that formally proposes some variant of my ideas around visibility
>> of locked tuples.
>
> V4 is attached.
>
> Most notably, this adds the modifications to HeapTupleSatisfiesMVCC(),
> though they're neater than in the snippet I sent earlier.
>
> There is also some clean-up around row-level locking. That code has
> been simplified. I also try and handle serialization failures in a
> better way, though that really needs the attention of a subject matter
> expert.
>
> There are a few additional XXX comments highlighting areas of concern,
> particularly around serializable behavior. I've deferred making higher
> isolation levels care about wrongfully relying on the special
> HeapTupleSatisfiesMVCC() exception (e.g. they won't throw a
> serialization failure, mostly because I couldn't decide on where to do
> the test on time prior to travelling tomorrow).
>
> I've added code to do heap_prepare_insert before value locks are held.
> Whatever our eventual value locking implementation, that's going to be
> a useful optimization. Though unfortunately I ran out of time to give
> this the scrutiny it really deserves, I suppose that it's something
> that we can return to later.
>
> I ask that reviewers continue to focus on concurrency issues and broad
> design issues, and continue to defer discussion about an eventual
> value locking implementation. I continue to think that that's the most
> useful way of proceeding for the time being. My earlier points about
> probable areas of concern [1] remain a good place for reviewers to
> start.
I think it's important to recap the design goals of this. I don't think
these have been listed before, so let me try:
* It should be usable and perform well for both large batch updates and
small transactions.
* It should perform well both when there are no duplicates, and when
there are lots of duplicates
And from that follows some finer requirements:
* Performance when there are no duplicates should be close to raw INSERT
performance.
* Performance when all rows are duplicates should be close to raw UPDATE
performance.
* We should not leave behind large numbers of dead tuples in either case.
Anything else I'm missing?
What about exclusion constraints? I'd like to see this work for them as
well. Currently, exclusion constraints are checked after the tuple is
inserted, and you abort if the constraint was violated. We could still
insert the heap and index tuples first, but instead of aborting on
violation, we would kill the heap tuple we already inserted and retry.
There are some complications there, like how to wake up any other
backends that are waiting to grab a lock on the tuple we just killed,
but it seems doable.
That would, however, perform badly and leave garbage behind if there are
duplicates. A refinement of that would be to first check for constraint
violations, then insert the tuple, and then check again. That would
avoid the garbage in most cases, but would perform much more poorly when
there are no duplicates, because it needs two index scans for every
insertion. A further refinement would be to keep track of how many
duplicates there have been recently, and switch between the two
strategies based on that.
That cost of doing two scans could be alleviated by using
markpos/restrpos to do the second scan. That is presumably cheaper than
starting a whole new scan with the same key. (markpos/restrpos don't
currently work for non-MVCC snapshots, so that'd need to be fixed, though)
And that detour with exclusion constraints takes me back to the current
patch :-). What if you implemented the unique check in a similar fashion
too (when doing INSERT ON DUPLICATE KEY ...)? First, scan for a
conflicting key, and mark the position. Then do the insertion to that
position. If the insertion fails because of a duplicate key (which was
inserted after we did the first scan), mark the heap tuple as dead, and
start over. The indexam changes would be quite similar to the changes
you made in your patch, but instead of keeping the page locked, you'd
only hold a pin on the target page (if even that). The first indexam
call would check that the key doesn't exist, and remember the insert
position. The second call would re-find the previous position, and
insert the tuple, checking again that there really wasn't a duplicate
key violation. The locking aspects would be less scary than your current
patch.
I'm not sure if that would perform as well as your current patch. I must
admit your current approach is pretty optimal performance-wise. But I'd
like to see it, and that would be a solution for exclusion constraints
in any case.
One fairly limitation with your current approach is that the number of
lwlocks you can hold simultaneously is limited (MAX_SIMUL_LWLOCKS ==
100). Another limitation is that the minimum for shared_buffers is only
16. Neither of those is a serious problem in real applications - no-one
runs with shared_buffers=16 and no sane schema has a hundred unique
indexes, but it's still something to consider.
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2013-11-18 14:55:24 | Re: inherit support for foreign tables |
Previous Message | Andrew Dunstan | 2013-11-18 14:41:13 | Re: additional json functionality |