Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Date: 2014-01-02 19:58:54
Message-ID: CAM3SWZTiepSWdVN4-qVSWaE7tZ1+frprYZo+FPhG8SmKWpW1zQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I decided to make at least a cursory attempt to measure or
characterize the performance of each of our approaches to value
locking. Being fair here is a non-trivial matter, because of the fact
that upserts can behave quite differently based on the need to insert
or update, lock contention and so on. Also, I knew that anything I
came up with would not be comparing like with like: as things stand,
the btree locking code is more or less correct, and the alternative
exclusion constraint supporting implementation is more or less
incorrect (of course, you may yet describe a way to fix the
unprincipled deadlocking previously revealed by my testcase [1], but
it is far from clear what impact this fix will have on performance).
Still, something is better than nothing.

This was run on a Linux server ("Linux 3.8.0-31-generic
#46~precise1-Ubuntu") with these specifications:
https://www.hetzner.de/en/hosting/produkte_rootserver/ex40 .
Everything fits in shared_buffers, but the I/O system is probably the
weakest link here.

To be 100% clear, I am comparing
btreelock_insert_on_dup.v5.2013_12_28.patch.gz [2] with
exclusion_insert_on_dup.2013_12_19.patch.gz [3]. I'm also testing a
third approach, involving avoidance of exclusive buffer locks and
heavyweight locks for upserts in the first phase of speculative
insertion. That patch is unposted, but shows a modest improvement over
[2].

I ran this against the table foo:

pgbench=# \d+ foo
Table "public.foo"
Column | Type | Modifiers | Storage | Stats target | Description
--------+---------+-----------+----------+--------------+-------------
a | integer | not null | plain | |
b | integer | | plain | |
c | text | | extended | |
Indexes:
"foo_pkey" PRIMARY KEY, btree (a)
Has OIDs: no

My custom script was:

\setrandom rec 1 :scale
with rej as(insert into foo(a, b, c) values(:rec, :rec, 'insert') on
duplicate key lock for update returning rejects *) update foo set c =
'update' from rej where foo.a = rej.a;

I specified that each pgbench client in each run should last for 200k
upserts (with 100k possible distinct key values), not that it should
last some number of seconds. The idea is that there is a reasonable
mix of inserts and updates initially, for lower client counts, but
exactly the same number of queries are run for each patch, so as to
normalize the effects of contention across both runs (this sure is
hand-wavy, but likely better than nothing). I'm just looking for
approximate numbers here, and I'm sure that you could find more than
one way to benchmark this feature, with varying degrees of sympathy
towards each of our two approaches to value locking. This benchmark
isn't sympathetic to btree locking at all, because there is a huge
amount of contention for the higher client counts, with 100% of
possible rows updated by the time we're done at 16 clients, for
example.

To compensate somewhat for the relatively low duration of each run, I
take an average-of-5, rather than an average-of-3 as representative
for each client count + run/patch combination.

Full report of results are here:
http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/upsert-cmp/

My executive summary is that the exclusion patch performs about the
same on lower client counts, presumably due to not having the
additional window of btree lock contention. By 8 clients, the
exclusion patch does noticeably better, but it's a fairly modest
improvement.

Forgive me if I'm belaboring the point, but even though I'm
benchmarking the simplest possible upsert statements, had I chosen
small pgbench scale factors (e.g. scales that would see 100 - 1000
possible distinct key values in total) the btree locking
implementation would surely win very convincingly, just because the
alternative implementation would spend almost all of its time
deadlocked, waiting for the deadlock detector to free clients in one
second deadlock_timeout cycles. My goal here was just to put a rough
number on how these two approaches compare, while trying to be as fair
as possible.

I have to wonder about the extent to which the exclusion approach
benefits from holding old value locks. So even if the unprincipled
deadlocking issue can be fixed without much additional cost, it might
be that the simple fact that that approach holds those pseudo "value
locks" (i.e. old dead rows from previous iterations on the same tuple
slot) indefinitely helps performance, and losing that property alone
will hurt performance, even though it's necessary.

For those that wonder what the effect on multiple unique index would
be, that isn't really all that relevant, since contention on multiple
unique indexes isn't expected with idiomatic usage (though I suppose
an upsert's non-HOT update would have to compete).

[1] http://www.postgresql.org/message-id/CAM3SWZShbE29KpoD44cVc3vpZJGmDer6k_6FGHiSzeOZGmTFSQ@mail.gmail.com

[2] http://www.postgresql.org/message-id/CAM3SWZRpnkuVrENDV3zM=BNTCv8-X3PYXt76pohGyAuP1iq-ug@mail.gmail.com

[3] http://www.postgresql.org/message-id/CAM3SWZShbE29KpoD44cVc3vpZJGmDer6k_6FGHiSzeOZGmTFSQ@mail.gmail.com

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christophe Pettus 2014-01-02 19:59:29 Streaming replication bug in 9.3.2, "WAL contains references to invalid pages"
Previous Message Tom Lane 2014-01-02 19:56:16 Re: preserving forensic information when we freeze