From: | Konstantin Knizhnik <knizhnik(at)garret(dot)ru> |
---|---|
To: | Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
Subject: | Re: Changing shared_buffers without restart |
Date: | 2025-04-17 11:21:07 |
Message-ID: | 6c3c55a0-001e-40b7-9ee2-e2f0cb12a70d@garret.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 25/02/2025 11:52 am, Dmitry Dolgov wrote:
>> On Fri, Oct 18, 2024 at 09:21:19PM GMT, Dmitry Dolgov wrote:
>> TL;DR A PoC for changing shared_buffers without PostgreSQL restart, via
>> changing shared memory mapping layout. Any feedback is appreciated.
Hi Dmitry,
I am sorry that I have not participated in the discussion in this thread
from the very beginning, although I am also very interested in dynamic
shared buffer resizing and evn proposed my own implementation of it:
https://github.com/knizhnik/postgres/pull/2 based on memory ballooning
and using `madvise`. And it really works (returns unused memory to the
system).
This PoC allows me to understand the main drawbacks of this approach:
1. Performance of Postgres CLOCK page eviction algorithm depends on
number of shared buffers. My first native attempt just to mark unused
buffers as invalid cause significant degrade of performance
pgbench -c 32 -j 4 -T 100 -P1 -M prepared -S
(here shared_buffers - is maximal shared buffers size and
`available_buffers` - is used part:
| shared_buffers | available_buffers | TPS | | ------------------|
---------------------------- | ---- | | 128MB | -1 | 280k | | 1GB | -1 |
324k | | 2GB | -1 | 358k | | 32GB | -1 | 350k | | 2GB | 128Mb | 130k | |
2GB | 1Gb | 311k | | 32GB | 128Mb | 13k | | 32GB | 1Gb | 140k | | 32GB |
2Gb | 348k |
My first thought is to replace clock with LRU based in double-linked
list. As far as there is no lockless double-list implementation,
it need some global lock. This lock can become bottleneck. The standard
solution is partitioning: use N LRU lists instead of 1.
Just as partitioned has table used by buffer manager to lockup buffers.
Actually we can use the same partitions locks to protect LRU list.
But it not clear what to do with ring buffers (strategies).So I decided
not to perform such revolution in bufmgr, but optimize clock to more
efficiently split reserved buffers.
Just add|skip_count|field to buffer descriptor. And it helps! Now the
worst case shared_buffer/available_buffers = 32Gb/128Mb
shows the same performance 280k as shared_buffers=128Mb without ballooning.
2. There are several data structures i Postgres which size depends on
number of buffers.
In my patch I used in some cases dynamic shared buffer size, but if this
structure has to be allocated in shared memory then still maximal size
has to be used. We have the buffers themselves (8 kB per buffer), then
the main BufferDescriptors array (64 B), the BufferIOCVArray (16 B),
checkpoint's CkptBufferIds (20 B), and the hashmap on the buffer cache
(24B+8B/entry).
128 bytes per 8kb bytes seems to large overhead (~1%) but but it may be
quote noticeable with size differences larger than 2 orders of magnitude:
E.g. to support scaling to from 0.5Gb to 128GB , with 128 bytes/buffer
we'd have ~2GiB of static overhead on only 0.5GiB of actual buffers.
3. `madvise` is not portable.
Certainly you have moved much further in your proposal comparing with my
PoC (including huge pages support).
But it is still not quite clear to me how you are going to solve the
problems with large memory overhead in case of ~100x times variation of
shared buffers size.
I
From | Date | Subject | |
---|---|---|---|
Next Message | Tatsuo Ishii | 2025-04-17 11:30:47 | Missing comma in libpq.sgml |
Previous Message | Melih Mutlu | 2025-04-17 11:05:42 | Re: Align memory context level numbering in pg_log_backend_memory_contexts() |