Quick Links

Postgresql TPS Bottleneck

From:	<wakandavision(at)outlook(dot)com>
To:	"pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject:	Postgresql TPS Bottleneck
Date:	2022-03-31 11:50:34
Message-ID:	VI1PR04MB31338E811E0896D00D85B69B97E19@VI1PR04MB3133.eurprd04.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hi everyone,

I am a bachelor's student and writing my thesis about the scaling and
performance of an application. The application is using postgresql as a
database but we can't scale any further currently as it seems postgres
is hitting the limit.

With the application, as well as with pgbench, we don't get more than
(max) 70k TPS on postgres. But the servers' resources are not utilized
completely (more below).

I've tried many different configurations but none of them had any major
performance impact (unless fsync and synchronous_commit = off).

This is the (custom) configuration I am using:

shared_buffers=65551953kB
effective_cache_size=147491895kB
huge_pages=on
min_wal_size=20GB
max_wal_size=200GB
wal_buffers=1GB
max_wal_senders=0
archive_mode=off
wal_level=minimal
work_mem=2GB
maintenance_work_mem=4GB
checkpoint_completion_target=0.9
checkpoint_timeout = 30min
random_page_cost=1.1
bgwriter_flush_after = 2MB
effective_io_concurrency = 200
# Disabled just for performance experiments
fsync = off
synchronous_commit = off
full_page_writes = on
max_worker_processes=64
max_parallel_workers=64
max_parallel_workers_per_gather=10
max_parallel_maintenance_workers=12

The system is as follows:

* 64 Cores (Intel Xeon Gold 6130 (Skylake, 2.10GHz, 2 CPUs/node, 16
cores/CPU))
* 192 GiB RAM (12 * 16GiB DIMM DDR4 Synchronous Registered (Buffered)
2666 MHz (0.4 ns))
* 2 * SSD SATA Samsung MZ7KM240HMHQ0D3 (one is used for the WAL and the
other for the data)
* 10 Gbps network link
* OS: Debian 11
* Postgres 13 from apt

(I've also written a stackoverflow post about it -
https://stackoverflow.com/questions/71631348/postgresql-bottleneck-neither-cpu-network-nor-i-o )
[https://cdn.sstatic.net/Sites/stackoverflow/Img/apple-touch-icon(at)2(dot)png?v=73d79a89bded]<https://stackoverflow.com/questions/71631348/postgresql-bottleneck-neither-cpu-network-nor-i-o>
performance - Postgresql bottleneck neither CPU, network nor I/O - Stack Overflow<https://stackoverflow.com/questions/71631348/postgresql-bottleneck-neither-cpu-network-nor-i-o>
We are testing our application for performance, which is using Postgresql 13 as a database. It is very insert and update heavy and we cannot get more than 65k TPS on the database. But none of the m...
stackoverflow.com

Below is just an example of the pgbench I ran:

pgbench -i -s 50 -U postgres -h <DB_HOST> -d <DB_NAME>
pgbench -c 64 -j 32 -t 100000 -h <DB_HOST> -U postgres <DB_NAME>

starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 50
query mode: simple
number of clients: 64
number of threads: 32
number of transactions per client: 100000
number of transactions actually processed: 6400000/6400000
latency average = 0.976 ms
tps = 65584.664360 (including connections establishing)
tps = 65594.330678 (excluding connections establishing)

As comparison (average of three runs with pgbench as above):

num clients default config custom config above

10 11336 16848
20 19528 30187
30 25769 39430
40 29792 50150
50 31096 60133
60 33900 64916
70 34986 64308
80 34170 63075
90 35108 59910
100 34864 58320
120 35124 55414
140 33403 53610

(with fsync=off alone I almost get the TPS from the right already)

For `-S -M prepared` the TPS is ~700k and for `-S` ~500k but as the
application is very write heavy this is not really useful for me.

With the app the CPU is only at 25% load and the disks are also no
problem. For pgbench its about 75% CPU but still no disk bottleneck
(about 5%).

There are also Grafana snapshots I created for the system (node-
exporter) and postgres (prometheus-postgres-exporter) while running
with our application (same configuration as above). Both do not show
any kind of bottleneck (except high amounts context switches and pages
in/out)

node: https://147.87.255.221:3000/dashboard/snapshot/3eXe1sS3QDL6cbvI7HkPjYnjrVLCNOOF
postgres: https://147.87.255.221:3000/dashboard/snapshot/wHkRphdr3D4k5kRckhn57Pc6ZD3st1x7

I have also looked at postgresql's lock tables while running the above
experiment, but there is nothing which seemed strange to me. There are
about 300 locks but all are granted (select * from pg_locks).

Also, the following query:

select wait_event, count(*) from pg_stat_activity where state='idle in
transaction' group by wait_event;

did not show some contention there the output looks always similar to
this (80 clients):

wait_event | count
--------------------------+-------
ClientRead | 2
SerializableFinishedList | 1

Thanks to the slack channel I got a link to edb which used a more
powerful server and they achieved also about 70k TPS but did not set
fsync=off. So maybe they were limited by disk IO (just guessing, as
unfortunately, it is not pointed out in the post).

https://www.enterprisedb.com/blog/pgbench-performance-benchmark-postgresql-12-and-edb-advanced-server-12

So, my question is if anyone knows what could be the bottleneck, or if
it is even possible to get more TPS in this write-heavy load.

(dmesg does also not contain error messages which would point to a
kernel misconfiguration)

Optimally I would like to fully use the CPU and get about 3-4 times
more TPS (if even possible).

Thanks already for everyone's time and help.

Responses

Re: Postgresql TPS Bottleneck at 2022-03-31 14:18:37 from Guillaume Cottenceau
Re: Postgresql TPS Bottleneck at 2022-03-31 18:55:00 from MichaelDBA
Re: Postgresql TPS Bottleneck at 2022-03-31 19:16:50 from Tomas Vondra
Re: Postgresql TPS Bottleneck at 2022-04-01 02:40:25 from Mladen Gogala

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Guillaume Cottenceau	2022-03-31 14:18:37	Re: Postgresql TPS Bottleneck
Previous Message	Justin Pryzby	2022-03-31 06:49:41	Re: HIGH IO and Less CPU utilization