Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page (Was: [WIP] Zipfian distribution in pgbench)

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kevin Grittner <kgrittn(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Alik Khilazhev <a(dot)khilazhev(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Sokolov Yura <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page (Was: [WIP] Zipfian distribution in pgbench)
Date: 2017-08-04 22:30:38
Message-ID: CAH2-Wzm89Byt95e3f6pVdzRB0zja3VVGUDpbxzm=eWyMkutWWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 31, 2017 at 10:54 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Let's wait to see what difference it makes if Alik's zipfian
> distribution pgbench test case uses unlogged tables. That may gives us a
> good sense of the problem for cases with contention/concurrency.

Yura Sokolov of Postgres Pro performed this benchmark at my request.
He took the 9.5 commit immediately proceeding 2ed5b87f9 as a baseline.
In all cases, logged tables were used. Note that this is not the most
effective benchmark for showing the regression, because he didn't
replace the PK with a non-unique index, though that is planned as
follow-up; I wanted to stick with Alik's Zipfian test case for the
time being.

(Using a unique index is not the most effective thing for showing the
regression because unique indexes have most LP_DEAD setting done in
_bt_check_unique(), so only SELECTs will do less LP_DEAD cleanup
there; SELECTs are 50% of all queries.)

His results with 10 minute pgbench runs:

Logged
clients | 8217fb14 | 2ed5b87f | master | hashsnap | hashsnap_lwlock
--------+----------+----------+--------+----------+----------------
10 | 201569 | 204154 | 201095 | 201793 | 206111
20 | 328180 | 333358 | 334858 | 336721 | 370769
40 | 196352 | 194692 | 232413 | 231968 | 393947
70 | 121083 | 116881 | 148066 | 148443 | 224162
110 | 77989 | 73414 | 99305 | 111404 | 161900
160 | 48958 | 45830 | 65830 | 82340 | 115678
230 | 27699 | 25510 | 38186 | 57617 | 80575
310 | 16369 | 15137 | 21571 | 39745 | 56819
400 | 10327 | 9486 | 13157 | 27318 | 40859
500 | 6920 | 6496 | 8638 | 18677 | 29429
650 | 4315 | 3971 | 5196 | 11762 | 17883
800 | 2850 | 2772 | 3523 | 7733 | 10977

Note that you also see numbers from various patches from Yura, and the
master branch mixed in here, but 8217fb14 (before) and 2ed5b87f
(after) are the interesting numbers as far as this regression goes.

There is an appreciable reduction in TPS here, though this workload is
not as bad by that measure as first thought. There is a roughly 5%
regression here past 40 clients or so. The regression in the
*consistency* of transaction *throughput* is far more interesting,
though. I've been doing analysis of this by drilling down to
individual test cases with vimdiff, as follows:

$ vimdiff test_8217fb14_logged_1_pgbench_40.out
test_2ed5b87f_logged_1_pgbench_40.out

(I attach these two files as a single example. I can provide the full
results to those that ask for them privately; it's too much data to
attach to an e-mail to the list.)

You can see in this example that for most 5 second slices of the 10
minute benchmark, commit 2ed5b87f actually increases TPS somewhat,
which is good. But there are also occasional *big* drops in TPS,
sometimes by as much as 50% over a single 5 second period (when
ANALYZE runs, adding random I/O during holding an exclusive buffer
lock [1]?). When this slowdown happens, latency can be over 3 times
higher, too.

We see much more consistent performance without the B-Tree buffer pin
VACUUM optimization, where there is no discernible pattern of
performance dropping. The headline regression of 4% or 5% is not the
main problem here, it seems.

In summary, commit 2ed5b87f makes the workload have increased
throughput most of the time, but occasionally sharply reduces
throughput, which averages out to TPS being 4% or 5% lower overall. I
think we'll find that there are bigger problems TPS-wise with
non-unique indexes when that other test is performed by Yura; let's
wait for those results to come back.

Finally, I find it interesting that when Yura did the same benchmark,
but with 5% SELECTs + 95% UPDATEs, rather than 50% SELECTs + 50%
UPDATEs as above, the overall impact was surprisingly similar. His
results:

clients | 8217fb14 | 2ed5b87f | master | hashsnap | hashsnap_lwlock
--------+----------+----------+--------+----------+----------------
20 | 187697 | 187335 | 217558 | 215059 | 266894
50 | 81272 | 78784 | 97948 | 97659 | 157710
85 | 49446 | 47683 | 64597 | 70814 | 107748
130 | 32358 | 30393 | 42216 | 50531 | 75001
190 | 19403 | 17569 | 25704 | 35506 | 51292
270 | 10803 | 9878 | 14166 | 23851 | 35257
370 | 6108 | 5645 | 7684 | 15390 | 23659
500 | 3649 | 3297 | 4225 | 9172 | 14643
650 | 2239 | 2049 | 2635 | 5887 | 8588
800 | 1475 | 1424 | 1804 | 4035 | 5611

If nothing else, this shows how generally reliant these kinds of
workload can be on LP_DEAD setting. And, there is one difference: The
regression is seen here at *all* client counts, even with only 20
clients, This is presumably because with only 5% SELECTs it's more
important that those few remaining SELECTs be able to perform LP_DEAD
setting.

[1] https://wiki.postgresql.org/wiki/Key_normalization#Avoiding_unnecessary_unique_index_enforcement
--
Peter Geoghegan

Attachment Content-Type Size
test_8217fb14_logged_1_pgbench_40.out application/octet-stream 7.6 KB
test_2ed5b87f_logged_1_pgbench_40.out application/octet-stream 7.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-08-04 22:41:12 Draft release notes up for review
Previous Message Tom Lane 2017-08-04 20:54:05 Re: PostgreSQL not setting OpenSSL session id context?