From: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Changing shared_buffers without restart |
Date: | 2024-12-17 14:10:11 |
Message-ID: | CAExHW5uyuHc3SkQhb8P_SKjMMPwY_Jp3=NCDneUpQWQuvq_ZUA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Dec 3, 2024 at 8:01 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Mon, Dec 2, 2024 at 2:18 PM Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
> > I've asked about that in linux-mm [1]. To my surprise, the
> > recommendations were to stick to creating a large mapping in advance,
> > and slice smaller mappings out of that, which could be resized later.
> > The OOM score should not be affected, and hugetlb could be avoided using
> > MAP_NORESERVE flag for the initial mapping (I've experimented with that,
> > seems to be working just fine, even if the slices are not using
> > MAP_NORESERVE).
> >
> > I guess that would mean I'll try to experiment with this approach as
> > well. But what others think? How much research do we need to do, to gain
> > some confidence about large shared mappings and make it realistically
> > acceptable?
>
> Personally, I like this approach. It seems to me that this opens up
> the possibility of a system where the virtual addresses of data
> structures in shared memory never change, which I think will avoid an
> absolutely massive amount of implementation complexity. It's obviously
> not ideal that we have to specify in advance an upper limit on the
> potential size of shared_buffers, but we can live with it. It's better
> than what we have today; and certainly cloud providers will have no
> issue with pre-setting that to a reasonable value. I don't know if we
> can port it to other operating systems, but it seems at least possible
> that they offer similar primitives, or will in the future; if not, we
> can disable the feature on those platforms.
>
> I still think the synchronization is going to be tricky. For example
> when you go to shrink a mapping, you need to make sure that it's free
> of buffers that anyone might touch; and when you grow a mapping, you
> need to make sure that nobody tries to touch that address space before
> they grow the mapping, which goes back to my earlier point about
> someone doing a lookup into the buffer mapping table and finding a
> buffer number that is beyond the end of what they've already mapped.
> But I think it may be doable with sufficient cleverness.
>
From the discussion so far, the protocol for each shared memory slot
(or segment as suggested by Robert) seems to be the following.
1. At the start create a memory mapping using mmap with maximum
allocation (maxsize) with PROT_READ/PROT_WRITE and MAP_NORESERVE to
reserve address space. Assume this is created at virtual address
maddr.
2. Resize it to the required size (size) using mremap() - this will be
used to create shared memory objects
3. Map a segment with PROT_NONE and MAP_NORESERVE at maddr + size.
This segment would not allow any other mapping to be added in the
required space. PROT_NONE will protect from unintentional writes/reads
from this space.
4. When resizing the segment remove the mapping created in step 3 and
execute step 2 and 3 again. Synchronization, mentioned by Robert,
should be carried out somewhere in this step.
Note that the addresses need to be aligned as per mmap and mremap requirements.
Please correct me if I am wrong.
I wrote the attached simple program simulating this protocol. It seems
to work as expected. However, mmap'ing with MAP_FIXED would still be
able to dislodge the reserved memory. But that's true with any mapped
segment; not just with reserved memory.
A bit about the program: It reserves a 3MB memory segment and resizes
it to 1MB, 2MB and back to 3MB, thus exercising both shrinking and
enlarging the memory. It forks a child process after resizing the the
memory segment first time. At every step it makes sure that the parent
and child programs can write and read at the boundaries of the resized
memory segment. The program waits for getchar() at these steps. So in
case the program seems to be stuck, try pressing Enter once or twice.
I could verify the memory mappings, their sizes etc. by looking at
/proc/PID/maps and /proc/PID/status but I did not find a way to verify
the amount of memory actually allocated and verify that it's actually
shrinking and expanding. Please let me know how to verify that.
--
Best Wishes,
Ashutosh Bapat
Attachment | Content-Type | Size |
---|---|---|
mmap_exp.c | text/x-csrc | 7.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2024-12-17 14:11:25 | Re: Maybe we should reduce SKIP_PAGES_THRESHOLD a bit? |
Previous Message | Alexander Lakhin | 2024-12-17 14:00:01 | 019_replslot_limit.pl might fail due to checkpoint skipped |