From: | Peter Geoghegan <pg(at)heroku(dot)com> |
---|---|
To: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Stephen Frost <sfrost(at)snowman(dot)net> |
Subject: | Re: Performance optimization of btree binary search |
Date: | 2013-12-05 10:19:24 |
Message-ID: | CAM3SWZTyOLZEMZtEQkizNC7V_tugTdCx3K+iQnxRfkPymheB8g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Dec 4, 2013 at 5:28 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> I'm also curious about the impact on insertion into primary key
> indexes. Presently, we hold an exclusive buffer lock for the duration
> of a couple of operations when checkUnique != UNIQUE_CHECK_NO.
> _bt_binsrch() is one such operation. The other one there,
> _bt_check_unique(), is likely to be a lot cheaper than _bt_binsrch()
> on average, I think, so I'm cautiously optimistic that it'll be
> noticeable. I better go and check it out.
I did a relatively short variant 'insert' pgbench-tools benchmark,
with a serial primary index created on the pgbench_history table.
Results:
http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/insert/
Note that in the interest of increasing the signal to noise ratio, I
used unlogged tables. These are considerably shorter two minute runs,
of inserts/writes, but even still it's interesting to compare it to my
original 'select' benchmark. Checkpoint settings were very high.
This looks to be about a wash. I'm not that surprised, because this is
all about memory bandwidth, not lock contention. Even with the lock
contention on the rightmost btree page, we top out at about the same
TPS level as the 'select' benchmark (at least compared to Postgres
master). However, we top out at that level much sooner here (growth is
much steeper), because the contention is mostly on the same one or two
btree leaf pages, which are well cached by CPU L1 cache. Writes are
actually quite a bit faster than uniformly-distributed reads, even
with the exclusive buffer locking (and lock contention) necessitated
by writing. You might even say that my patch makes the first benchmark
look more like this one (a less awkward comparison could probably be
made between what I've done for uniform distribution read workloads,
and a read workload with far more uneven data access, which will
naturally have fewer cache misses and less TLB contention). I strongly
suspect I wouldn't have seen a significant number of cache misses when
binary searching on btree pages if I was to instrument this workload.
It might still be worth pursuing the SERIAL hinting mechanism, but
it's not going to be nearly as big a win as what I've done for reads,
I think. It might also be worth looking at a workload with uniformly
distributed btree index tuple insertions, where I'd expect similar
advantages to those originally shown for reads.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2013-12-05 10:42:17 | Re: Proposal: variant of regclass |
Previous Message | Sameer Kumar | 2013-12-05 10:17:38 | Re: Changes in Trigger Firing |