How to reduce latency with fast short queries in Postgresql 15.3 on a NUMA server

From: Sergio Rus <geiros(at)gmail(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: How to reduce latency with fast short queries in Postgresql 15.3 on a NUMA server
Date: 2023-05-31 12:40:05
Message-ID: CAN8auqjwYcAFU393r6kG2CmfbUJaHkNorRFA6=uAEaTDS6+uCQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi guys,

I've been configuring a new server and tuning Postgresql 15.3, but I'm
struggling with a latency I'm consistently seeing with this new server when
running fast short queries, compared to the other server.

We're running two different versions of Postgresql:

- Server A: Postgresql 9.3
- Server B: Postgresql 15.3

Server B is the new server and is way more powerful than server A:

- Server A: 1x Intel Xeon E3-1270 3.5GHz, 2x 8GB DDR3, RAID0
- Server B: 2x Intel Xeon Platinum 8260 2.4GHz, 4x 16GB DDR4, RAID1

We're running Linux Ubuntu 20.04 on server B and I've been tweaking some
settings in Linux and Postgresql 15.3. With the current setup, Postgresql
15.3 is able to process more than 1 million transactions per second running
pgbench:

# pgbench --username postgres --select-only --client 100 --jobs 10
--time 20 test
pgbench (15.3 (Ubuntu 15.3-1.pgdg20.04+1))
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 1
query mode: simple
number of clients: 100
number of threads: 10
maximum number of tries: 1
duration: 20 s
number of transactions actually processed: 23039950
number of failed transactions: 0 (0.000%)
latency average = 0.087 ms
initial connection time = 62.536 ms
tps = 1155053.135317 (without initial connection time)

As shown in pgbench, the performance is great. Also when testing individual
queries, heavy queries (those taking a few ms) run faster on server B than
server A. Unfortunately when we run fast short SELECT queries (< 1 ms),
server A is consistently running faster than server B, even if the query
plans are the same:

Server A:

# EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT 1 AS "a" FROM "foobar"
WHERE ("foobar"."id" = 1) LIMIT 1;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..8.44 rows=1 width=0) (actual time=0.008..0.008
rows=1 loops=1)
Output: (1)
Buffers: shared hit=5
-> Index Only Scan using foobar_pkey on public.foobar
(cost=0.42..8.44 rows=1 width=0) (actual time=0.008..0.008 rows=1 loops=1)
Output: 1
Index Cond: (foobar.id = 1)
Heap Fetches: 1
Buffers: shared hit=5
Total runtime: 0.017 ms
(9 rows)

Time: 0.281 ms

Server B:

# EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT 1 AS "a" FROM "foobar"
WHERE ("foobar"."id" = 1) LIMIT 1;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..1.11 rows=1 width=4) (actual time=0.019..0.021
rows=1 loops=1)
Output: 1
Buffers: shared hit=4
-> Index Only Scan using foobar_pkey on public.foobar
(cost=0.00..1.11 rows=1 width=4) (actual time=0.018..0.018 rows=1 loops=1)
Output: 1
Index Cond: (foobar.id = 1)
Heap Fetches: 0
Buffers: shared hit=4
Planning Time: 0.110 ms
Execution Time: 0.045 ms
(10 rows)

Time: 0.635 ms

RAID1 could add some latency on server B if it was reading from disk, but
I've confirmed that these queries are hitting the buffer/cache and
therefore reading data from memory and not from disk. I've checked the hit
rate with the following query:

SELECT 'cache hit rate' AS name, sum(heap_blks_hit) /
(sum(heap_blks_hit) + sum(heap_blks_read)) AS ratio FROM
pg_statio_user_tables;

The hit rate was over 95% and it increased as soon as I ran those queries.
Same thing with the index hit rate.

I've been playing with some parameters in Postgresql, decreasing/increasing
the number of workers, shared buffers, work_mem, JIT, cpu_*_cost variables,
etc, but nothing did help to reduce that latency.

Here are the settings I'm currently using with Postgresql 15.3 after a lot
of work experimenting with different values:

checkpoint_completion_target = 0.9
checkpoint_timeout = 900
cpu_index_tuple_cost = 0.00001
cpu_operator_cost = 0.00001
effective_cache_size = 12GB
effective_io_concurrency = 200
jit = off
listen_addresses = 'localhost'
maintenance_work_mem = 1GB
max_connections = 100
max_parallel_maintenance_workers = 4
max_parallel_workers = 12
max_parallel_workers_per_gather = 4
max_wal_size = 4GB
max_worker_processes = 12
min_wal_size = 1GB
random_page_cost = 1.1
shared_buffers = 4GB
ssl = off
timezone = 'UTC'
wal_buffers = 16MB
work_mem = 64MB

Some notes about those settings:

- We're running other services on this server, that's why I'm not using
more resources.
- Tweaking the cpu_*_cost parameters was crucial to improve the query
plan. With the default values Postgresql was consistently using a slower
query plan.

I've been looking at some settings in Linux as well:

- Swappiness is set to the lowest safe value: vm.swappiness = 1
- Huge Pages is not being used and Transparent Huge Pages (THP) is set
to 'madvise'. Postgresql 15.3 is using the default value for the
'huge_pages' parameter: 'try'.
- The memory overcommit policy is set to 1: vm.overcommit_memory = 1

I've been playing with Huge Pages, to try to force Postgresql using this
feature. I manually allocated the number of Huge Pages as shown in this
query:

SHOW shared_memory_size_in_huge_pages;

I confirmed Huge Pages were being used by Postgresql, but unfortunately I
didn't see any improvement regarding latency and performance. So I set this
back to the previous state.

Conclusion:

The latency is quite low on both servers, but when you're running dozens or
hundreds of fast short queries concurrently, on aggregate you see the
difference, with server A being 0.1-1.0 seconds faster than server B.

As you can see, server B has 2 CPUs and is using NUMA on Linux. And the
CPU clock is slower on server B than server A. Maybe any of those are
causing that latency?

Any suggestions or ideas where to look? I'd really appreciate your help.

Thank you

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kenneth Marshall 2023-05-31 13:47:12 Re: How to reduce latency with fast short queries in Postgresql 15.3 on a NUMA server
Previous Message Bob Jolliffe 2023-05-31 10:26:58 Re: Unaccounted regression from postgresql 11 in later versions