Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?

From: Saurabh Nanda <saurabhnanda(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?
Date: 2019-01-28 04:52:17
Message-ID: CAPz=2oHb7W5QzBSt-aAkmFBwC7EAJGxV_0x5OKwA_QeMwZNtbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

>
> It is usually not acceptable to run applications with
> synchronous_commit=off, so once you have identified that the bottleneck is
> in implementing synchronous_commit=on, you probably need to take a deep
> dive into your hardware to figure out why it isn't performing the way you
> need/want/expect it to. Tuning the server under synchronous_commit=off
> when you don't intend to run your production server with that setting is
> unlikely to be fruitful.
>

I do not intend to run the server with synchronous_commit=off, but based on
my limited knowledge, I'm wondering if all these observations are somehow
related and are caused by the same underlying bottleneck (or
misconfiguration):

1) At higher concurrency levels, TPS for synchronous_commit=off is lower
for optimised settings when compared to default settings
2) At ALL concurrency levels, TPS for synchronous_commit=on is lower for
optimised settings (irrespective of shared_buffers value), compared to
default settings
3) At higher concurrency levels, optimised + synchronous_commit=on +
shared_buffers=2G has HIGHER TPS than optimised + synchronous_commit=off +
shared_buffers=8G

Here are the (completely counter-intuitive) numbers for these observations:

+--------+-----------------------------------------------------------------+------------------------+
| | synchronous_commit=on
| synchronous_commit=off |
+--------+-----------------------------------------------------------------+------------------------+
| client | Mostly defaults [1] | Optimised [2] | Optimised [2]
| Optimised [2] |
| | | + shared_buffers=2G | +
shared_buffers=8G | + shared_buffers=8G |
+--------+---------------------+---------------------+---------------------+------------------------+
| 1 | 80-86 | 74-77 | 75-75
| 169-180 |
+--------+---------------------+---------------------+---------------------+------------------------+
| 6 | 350-376 | 301-304 | 295-300
| 1265-1397 |
+--------+---------------------+---------------------+---------------------+------------------------+
| 12 | 603-619 | 476-488 | 485-493
| 1746-2352 |
+--------+---------------------+---------------------+---------------------+------------------------+
| 24 | 947-1015 | 678-739 | 723-770
| 1869-2518 |
+--------+---------------------+---------------------+---------------------+------------------------+
| 48 | 1435-1512 | 950-1043 | 1029-1086
| 1912-2818 |
+--------+---------------------+---------------------+---------------------+------------------------+
| 96 | 1769-1811 | 3337-3459 | 1302-1346
| 1546-1753 |
+--------+---------------------+---------------------+---------------------+------------------------+
| 192 | 1857-1992 | 3613-3715 | 1269-1345
| 1332-1508 |
+--------+---------------------+---------------------+---------------------+------------------------+
| 384 | 1667-1793 | 3180-3300 | 1262-1364
| 1356-1450 |
+--------+---------------------+---------------------+---------------------+------------------------+

>
> In case you do intend to run with synchronous_commit=off, or if you are
> just curious: running with a very high number of active connections often
> reveals subtle bottlenecks and interactions, and is very dependent on your
> hardware. Unless you actually intend to run our server with
> synchronous_commit=off and with a large number of active connections, it is
> probably not worth investigating this.
>

Please see the table above. The reason why I'm digging deeper into this is
because of observation (2) above, i.e. I am unable to come up with any
optimised setting that performs better than the default settings for the
concurrency levels that I care about (100-150).

> I'm more interested in the low end, you should do much better than those
> reported numbers when clients=1 and synchronous_commit=off with the data on
> SSD. I think you said that pgbench is running on a different machine than
> the database, so perhaps it is just network overhead that is keeping this
> value down. What happens if you run them on the same machine?
>

I'm currently running this, but the early numbers are surprising. For
client=1, the numbers for optimised settings + shared_buffers=2G are:

-- pgbench run over a 1Gbps network: 74-77 tps
-- pgbench run on the same machine: 152-153 tps (is this absolute number
good enough given my hardware?)

Is 1 Gbps network the bottleneck? Does it explain the three observations
given above? I'll wait for the current set of benchmarks to finish and
re-run the benchmarks over the network and monitor network utilisation.

[1] "Mostly default" settings are whatever ships with Ubuntu 18.04 + PG 11.
A snippet of the relevant setts are given below:

max_connection=400
work_mem=4MB
maintenance_work_mem=64MB
shared_buffers=128MB
temp_buffers=8MB
effective_cache_size=4GB
wal_buffers=-1
wal_sync_method=fsync
max_wal_size=1GB
* autovacuum=off # Auto-vacuuming was disabled*

[2] Optimized settings

max_connections = 400
* shared_buffers = 8GB # or 2GB -- depending upon which
scenario was being evaluated*
effective_cache_size = 24GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 3495kB
min_wal_size = 1GB
max_wal_size = 2GB
max_worker_processes = 12
max_parallel_workers_per_gather = 6
max_parallel_workers = 12
* autovacuum=off # Auto-vacuuming was disabled*

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Saurabh Nanda 2019-01-28 04:59:41 Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?
Previous Message Jeff Janes 2019-01-27 19:41:57 Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?