Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Date: 2014-01-14 02:45:17
Message-ID: CAM3SWZSPZ-a4d5bFSR9FRA8QH8A-V3iwiNLaAhUhp-k8ifmsjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 13, 2014 at 12:49 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
>> In any case, my patch is bound to win decisively for the other
>> extreme, the insert-only case, because the overhead of doing an index
>> scan first is always wasted there with your approach, and the overhead
>> of extended btree leaf page locking has been shown to be quite low.
>
> Quite possibly. Run the benchmark, and we'll see how big a difference we're
> talking about.

I'll come up with something and let you know.

> Another way to optimize it is to keep the b-tree page pinned after doing the
> pre-check. Then you don't need to descend the tree again when doing the
> insert. That would require small indexam API changes, but wouldn't be too
> invasive, I think.

You'll still need a callback to drop the pin when it transpires that
there is a conflict in a later unique index, and state to pass a bt
stack back, at which point you've already made exactly the same
changes to the AM interface as in my proposal. The only difference is
that the core code doesn't rely on the value locks being released
after an instant, but that isn't something that you take advantage of.
Furthermore, AFAIK there is no reason to think that anything other
than btree will benefit, which makes it a bit unfortunate that the AM
has to support it generally.

So, again, it's kind of a modularity violation, and it may not even
actually be possible, since _bt_search() is only callable with an
insertion scankey, which is the context in which the existing
guarantee around releasing locks and re-searching from that point
applies, for reasons that seem to me to be very subtle. At the very
least you need to pass a btstack to _bt_doinsert() to save the work of
re-scanning, as I do.

>> All other concerns of mine still remain, including the concern over
>> the extra locking of the proc array - I'm concerned about the
>> performance impact of that on other parts of the system not exercised
>> by this test.
>
> Yeah, I'm not thrilled about that part either. Fortunately there are other
> ways to implement that. In fact, I think you could just not bother taking
> the ProcArrayLock when setting the fields. The danger is that another
> backend sees a mixed state of the fields, but that's OK. The worst that can
> happen is that it will do an unnecessary lock/release on the heavy-weight
> lock. And to reduce the overhead when reading the fields, you could merge
> the SpeculativeInsertionIsInProgress() check into
> TransactionIdIsInProgress(). The call site in tqual.c always calls it
> together with TransactionIdIsInProgress(), which scans the proc array
> anyway, while holding the lock.

Currently in your patch all insertions do
SpeculativeInsertionLockAcquire(GetCurrentTransactionId()) -
presumably this is not something you intend to keep. Also, you should
not do this for regular insertion:

if (options & HEAP_INSERT_SPECULATIVE)
SetSpeculativeInsertion(relation->rd_node, &heaptup->t_self);

Can you explain the following, please?:

+ /*
+ * Returns a speculative insertion token for waiting for the insertion to
+ * finish.
+ */
+ uint32
+ SpeculativeInsertionIsInProgress(TransactionId xid, RelFileNode rel,
ItemPointer tid)
+ {
+ uint32 result = 0;
+ ProcArrayStruct *arrayP = procArray;
+ int index;

Why is this optimization correct? Presently it allows your patch to
avoid getting a shared ProcArrayLock from HeapTupleSatisfiesDirty().

+ if (TransactionIdPrecedes(xid, TransactionXmin))
+ return false;

So from HeapTupleSatisfiesDirty(), you're checking if "xid" (the
passed tuple's xmin) precedes our transaction's xmin (well, that of
our last snapshot updated by GetSnapshotData()). This is set within
GetSnapshotData(), but we're dealing with a dirty snapshot with no
xmin, so TransactionXmin pertains to our MVCC snapshot, not our dirty
snapshot.

It isn't really true that TransactionIdIsInProgress() gets the same
shared ProcArrayLock in a similar fashion, for a full linear search; I
think that the various fast-paths make it far less likely than it is
for SpeculativeInsertionIsInProgress() (or, perhaps, should be). Here
is what that other routine does in around the same place:

/*
* Don't bother checking a transaction older than RecentXmin; it could not
* possibly still be running. (Note: in particular, this guarantees that
* we reject InvalidTransactionId, FrozenTransactionId, etc as not
* running.)
*/
if (TransactionIdPrecedes(xid, RecentXmin))
{
xc_by_recent_xmin_inc();
return false;
}

This extant code checks against RecentXmin, *not* TransactionXmin. It
also caches things quite effectively, but that caching isn't very
useful to you here. It checks latestCompletedXid before doing a linear
search through the proc array too.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-01-14 02:46:59 Re: Where do we stand on 9.3 bugs?
Previous Message Dave Chinner 2014-01-14 02:44:08 Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance