Re: Changing shared_buffers without restart

From: Ni Ku <jakkuniku(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Changing shared_buffers without restart
Date: 2025-03-20 08:55:47
Message-ID: CAPuPUJz4EB5NU1ah3NH9HjBaq-dCfJMAgmou7YY5sofQ-xBbQQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dmitry / Ashutosh,
Thanks for the patch set. I've been doing some testing with it and in
particular want to see if this solution would work with hugepage bufferpool.

I ran some simple tests (outside of PG) on linux kernel v6.1, which has
this commit that added some hugepage support to mremap (
https://patchwork.kernel.org/project/linux-mm/patch/20211013195825(dot)3058275-1-almasrymina(at)google(dot)com/
).

From reading the kernel code and testing, for a hugepage-backed mapping it
seems mremap supports only shrinking but not growing. Further, for
shrinking, what I observed is that after mremap is called the hugepage
memory
is not released back to the OS, rather it's released when the fd is closed
(or when the memory is unmapped for a mapping created with MAP_ANONYMOUS).
I'm not sure if this behavior is expected, but being able to release memory
back to the OS immediately after mremap would be important for use cases
such as supporting "serverless" PG instances on the cloud.

I'm no expert in the linux kernel so I could be missing something. It'd be
great if you or somebody can comment on these observations and whether this
mremap-based solution would work with hugepage bufferpool.

I also attached the test program in case someone can spot I did something
wrong.

Regards,

Jack Ng

On Tue, Mar 18, 2025 at 11:02 AM Ashutosh Bapat <
ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:

> On Tue, Feb 25, 2025 at 3:22 PM Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
> wrote:
> >
> > > On Fri, Oct 18, 2024 at 09:21:19PM GMT, Dmitry Dolgov wrote:
> > > TL;DR A PoC for changing shared_buffers without PostgreSQL restart, via
> > > changing shared memory mapping layout. Any feedback is appreciated.
> >
> > Hi,
> >
> > Here is a new version of the patch, which contains a proposal about how
> to
> > coordinate shared memory resizing between backends. The rest is more or
> less
> > the same, a feedback about coordination is appreciated. It's a lot to
> read, but
> > the main difference is about:
>
> Thanks Dmitry for the summary.
>
> >
> > 1. Allowing to decouple a GUC value change from actually applying it,
> sort of a
> > "pending" change. The idea is to let a custom logic be triggered on an
> assign
> > hook, and then take responsibility for what happens later and how it's
> going to
> > be applied. This allows to use regular GUC infrastructure in cases where
> value
> > change requires some complicated processing. I was trying to make the
> change
> > not so invasive, plus it's missing GUC reporting yet.
> >
> > 2. Shared memory resizing patch became more complicated thanks to some
> > coordination between backends. The current implementation was chosen
> from few
> > more or less equal alternatives, which are evolving along following
> lines:
> >
> > * There should be one "coordinator" process overseeing the change. Having
> > postmaster to fulfill this role like in this patch seems like a natural
> idea,
> > but it poses certain challenges since it doesn't have locking
> infrastructure.
> > Another option would be to elect a single backend to be a coordinator,
> which
> > will handle the postmaster as a special case. If there will ever be a
> > "coordinator" worker in Postgres, that would be useful here.
> >
> > * The coordinator uses EmitProcSignalBarrier to reach out to all other
> backends
> > and trigger the resize process. Backends join a Barrier to synchronize
> and wait
> > untill everyone is finished.
> >
> > * There is some resizing state stored in shared memory, which is there to
> > handle backends that were for some reason late or didn't receive the
> signal.
> > What to store there is open for discussion.
> >
> > * Since we want to make sure all processes share the same understanding
> of what
> > NBuffers value is, any failure is mostly a hard stop, since to rollback
> the
> > change coordination is needed as well and sounds a bit too complicated
> for now.
> >
>
> I think we should add a way to monitor the progress of resizing; at
> least whether resizing is complete and whether the new GUC value is in
> effect.
>
> > We've tested this change manually for now, although it might be useful
> to try
> > out injection points. The testing strategy, which has caught plenty of
> bugs,
> > was simply to run pgbench workload against a running instance and change
> > shared_buffers on the fly. Some more subtle cases were verified by
> manually
> > injecting delays to trigger expected scenarios.
>
> I have shared a script with my changes but it's far from being full
> testing. We will need to use injection points to test specific
> scenarios.
>
> --
> Best Wishes,
> Ashutosh Bapat
>
>
>
>
>

Attachment Content-Type Size
hugepage_remap.c application/octet-stream 1.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ravi 2025-03-20 09:05:36 Re: [PERF] Improve Cardinality Estimation for Joins with GROUP BY Having Single Clause
Previous Message Ilia Evdokimov 2025-03-20 08:47:56 Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment