From: | Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Teodor Sigaev <teodor(at)sigaev(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Patch: fix lock contention for HASHHDR.mutex |
Date: | 2015-12-22 15:39:53 |
Message-ID: | 20151222183953.771cb58b@fujitsu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> > Actually, I'd like to improve all partitioned hashes instead of
> > improve only one case.
>
> Yeah. I'm not sure that should be an LWLock rather than a spinlock,
> but we can benchmark it both ways.
I would like to share some preliminary results. I tested four
implementations:
- no locks and no element stealing from other partitions;
- single LWLock per partitioned table;
- single spinlock per partitioned table;
- NUM_LOCK_PARTITIONS spinlocks per partitioned table;
Interestingly "Shared Buffer Lookup Table" (see buf_table.c) has 128
partitions. The constant NUM_BUFFER_PARTITIONS was increased from 16 to
128 in commit 3acc10c9:
Obviously after splitting a freelist into NUM_LOCK_PARTITIONS
partitions (and assuming that all necessary locking/unlocking is done
on calling side) tables can't have more than NUM_LOCK_PARTITIONS
partitions because it would cause race conditions. For this reason I
had to define NUM_BUFFER_PARTITIONS as NUM_LOCK_PARTITIONS and compare
behaviour of PostgreSQL depending on different values of
NUM_LOCK_PARTITIONS.
So here are results:
Core i7, pgbench -j 8 -c 8 -T 30 pgbench
(3 tests, TPS excluding connections establishing)
NUM_LOCK_ | master | no locks | lwlock | spinlock | spinlock
PARTITIONS | (99ccb2) | | | | array
-----------|----------|----------|----------|----------|----------
| 295.4 | 297.4 | 299.4 | 285.6 | 302.7
(1 << 4) | 286.1 | 300.5 | 283.4 | 300.9 | 300.4
| 300.0 | 300.0 | 302.1 | 300.7 | 300.3
-----------|----------|----------|----------|----------|----------
| | 296.7 | 299.9 | 298.8 | 298.3
(1 << 5) | ---- | 301.9 | 302.2 | 305.7 | 306.3
| | 287.7 | 301.0 | 303.0 | 304.5
-----------|----------|----------|----------|----------|----------
| | 296.4 | 300.5 | 302.9 | 304.6
(1 << 6) | ---- | 301.7 | 305.6 | 306.4 | 302.3
| | 299.6 | 304.5 | 306.6 | 300.4
-----------|----------|----------|----------|----------|----------
| | 295.9 | 298.7 | 295.3 | 305.0
(1 << 7) | ---- | 299.5 | 300.5 | 299.0 | 310.2
| | 287.8 | 285.9 | 300.2 | 302.2
Core i7, pgbench -j 8 -c 8 -f big_table.sql -T 30 my_database
(3 test, TPS excluding connections establishing)
NUM_LOCK_ | master | no locks | lwlock | spinlock | spinlock
PARTITIONS | (99ccb2) | | | | array
-----------|----------|----------|----------|----------|----------
| 505.1 | 521.3 | 511.1 | 524.4 | 501.6
(1 << 4) | 452.4 | 467.4 | 509.2 | 472.3 | 453.7
| 435.2 | 462.4 | 445.8 | 467.9 | 467.0
-----------|----------|----------|----------|----------|----------
| | 514.8 | 476.3 | 507.9 | 510.6
(1 << 5) | ---- | 457.5 | 491.2 | 464.6 | 431.7
| | 442.2 | 457.0 | 495.5 | 448.2
-----------|----------|----------|----------|----------|----------
| | 516.4 | 502.5 | 468.0 | 521.3
(1 << 6) | ---- | 463.6 | 438.7 | 488.8 | 455.4
| | 434.2 | 468.1 | 484.7 | 433.5
-----------|----------|----------|----------|----------|----------
| | 513.6 | 459.4 | 519.6 | 510.3
(1 << 7) | ---- | 470.1 | 454.6 | 445.5 | 415.9
| | 459.4 | 489.7 | 457.1 | 452.8
60-core server, pgbench -j 64 -c 64 -T 30 pgbench
(3 tests, TPS excluding connections establishing)
NUM_LOCK_ | master | no locks | lwlock | spinlock | spinlock
PARTITIONS | (99ccb2) | | | | array
-----------|----------|----------|----------|----------|----------
| 3156.2 | 3157.9 | 3542.0 | 3444.3 | 3472.4
(1 << 4) | 3268.5 | 3444.7 | 3485.7 | 3486.0 | 3500.5
| 3251.2 | 3482.3 | 3398.7 | 3587.1 | 3557.7
-----------|----------|----------|----------|----------|----------
| | 3352.7 | 3556.0 | 3543.3 | 3526.8
(1 << 5) | ---- | 3465.0 | 3475.2 | 3486.9 | 3528.4
| | 3410.0 | 3482.0 | 3493.7 | 3444.9
-----------|----------|----------|----------|----------|----------
| | 3437.8 | 3413.1 | 3445.8 | 3481.6
(1 << 6) | ---- | 3470.1 | 3478.4 | 3538.5 | 3579.9
| | 3450.8 | 3431.1 | 3509.0 | 3512.5
-----------|----------|----------|----------|----------|----------
| | 3425.4 | 3534.6 | 3414.7 | 3517.1
(1 << 7) | ---- | 3436.5 | 3430.0 | 3428.0 | 3536.4
| | 3455.6 | 3479.7 | 3573.4 | 3543.0
60-core server, pgbench -j 64 -c 64 -f big_table.sql -T 30 my_database
(3 tests, TPS excluding connections establishing)
NUM_LOCK_ | master | no locks | lwlock | spinlock | spinlock
PARTITIONS | (99ccb2) | | | | array
-----------|----------|----------|----------|----------|----------
| 661.1 | 4639.6 | 1435.2 | 445.9 | 1589.6
(1 << 4) | 642.9 | 4566.7 | 1410.3 | 457.1 | 1601.7
| 643.9 | 4621.8 | 1404.8 | 489.0 | 1592.6
-----------|----------|----------|----------|----------|----------
| | 4721.9 | 1543.1 | 499.1 | 1596.9
(1 << 5) | ---- | 4506.8 | 1513.0 | 528.3 | 1594.7
| | 4744.7 | 1540.3 | 524.0 | 1593.0
-----------|----------|----------|----------|----------|----------
| | 4649.1 | 1564.5 | 475.9 | 1580.1
(1 << 6) | ---- | 4671.0 | 1560.5 | 485.6 | 1589.1
| | 4751.0 | 1557.4 | 505.1 | 1580.3
-----------|----------|----------|----------|----------|----------
| | 4657.7 | 1551.8 | 534.7 | 1585.1
(1 << 7) | ---- | 4616.8 | 1546.8 | 495.8 | 1623.4
| | 4779.2 | 1538.5 | 537.4 | 1588.5
All four implementations (W.I.P. quality --- dirty code, no comments,
etc) are attached to this message. Schema of my_database and
big_table.sql file are attached to the first message of this thread.
A large spread of TPS on Core i7 is due to the fact that its actually
my laptop with other applications running beside PostgreSQL. Still we
see that all solutions are equally good on this CPU and there is no
performance degradation.
Now regarding 60-core server:
- One spinlock per hash table doesn't scale. I personally was expecting
this;
- LWLock's and array of spinlocks do scale on NUMA up to a certain
point;
- Best results are shown by "no locks";
I believe that "no locks" implementation is the best one since it is at
least 3 times faster on NUMA then any other implementation. Also it is
simpler and doesn't have stealing-from-other-freelists logic that
executes rarely and therefore is a likely source of bugs. Regarding ~16
elements of freelists which in some corner cases could but wouldn't be
used --- as I mentioned before I believe its not such a big problem.
Also its a small price to pay for 3 times more TPS.
Regarding NUM_LOCK_PARTITIONS (and NUM_BUFFER_PARTITIONS) I have some
doubts. For sure Robert had a good reason for committing 3acc10c9.
Unfortunately I'm not familiar with a story behind this commit. What do
you think?
Attachment | Content-Type | Size |
---|---|---|
lwlock.patch | text/x-patch | 13.2 KB |
no-locks.patch | text/x-patch | 11.6 KB |
spinlock.patch | text/x-patch | 13.3 KB |
spinlock-array.patch | text/x-patch | 14.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2015-12-22 15:46:14 | Re: Remove Windows crash dump support? |
Previous Message | Robert Haas | 2015-12-22 15:37:15 | Re: A typo in syncrep.c |