Re: Changing shared_buffers without restart

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Changing shared_buffers without restart
Date: 2024-11-25 19:33:48
Message-ID: CA+TgmoZFfn0E+EkUAjnv_QM_00eUJPkgCJKzm3n1G4itJKMSsA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 18, 2024 at 3:21 PM Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
> TL;DR A PoC for changing shared_buffers without PostgreSQL restart, via
> changing shared memory mapping layout. Any feedback is appreciated.

A lot of people would like to have this feature, so I hope this
proposal works out. Thanks for working on it.

I think the idea of having multiple shared memory segments is
interesting and makes sense, but I would prefer to see them called
"segments" rather than "slots" just as do we do for DSMs. The name
"slot" is somewhat overused, and invites confusion with replication
slots, inter alia. I think it's possible that having multiple fixed
shared memory segments will spell trouble on Windows, where we already
need to use a retry loop to try to get the main shared memory segment
mapped at the correct address. If there are multiple segments and we
need whatever ASLR stuff happens on Windows to not place anything else
overlapping with any of them, that means there's more chances for
stuff to fail than if we just need one address range to be free.
Granted, the individual ranges are smaller, so maybe it's fine? But I
don't know.

The big thing that worries me is synchronization, and while I've only
looked at the patch set briefly, it doesn't look to me as though
there's enough machinery here to make that work correctly. Suppose
that shared_buffers=8GB (a million buffers) and I change it to
shared_buffers=16GB (2 million buffers). As soon as any one backend
has seen that changed and expanded shared_buffers, there's a
possibility that some other backend which has not yet seen the change
might see a buffer number greater than a million. If it tries to use
that buffer number before it absorbs the change, something bad will
happen. The most obvious way for it to see such a buffer number - and
possibly the only one - is to do a lookup in the buffer mapping table
and find a buffer ID there that was inserted by some other backend
that has already seen the change.

Fixing this seems tricky. My understanding is that BufferGetBlock() is
extremely performance-critical, so having to do a bounds check there
to make sure that a given buffer number is in range would probably be
bad for performance. Also, even if the overhead weren't prohibitive, I
don't think we can safely stick code that unmaps and remaps shared
memory segments into a function that currently just does math, because
we've probably got places where we assume this operation can't fail --
as well as places where we assume that if we call BufferGetBlock(i)
and then BufferGetBlock(j), the second call won't change the answer to
the first.

It seems to me that it's probably only safe to swap out a backend's
notion of where shared_buffers is located when the backend holds on
buffer pins, and maybe not even all such places, because it would be a
problem if a backend looks up the address of a buffer before actually
pinning it, on the assumption that the answer can't change. I don't
know if that ever happens, but it would be a legal coding pattern
today. Doing it between statements seems safe as long as there are no
cursors holding pins. Doing it in the middle of a statement is
probably possible if we can verify that we're at a "safe" point in the
code, but I'm not sure exactly which points are safe. If we have no
code anywhere that assumes the address of an unpinned buffer can't
change before we pin it, then I guess the check for pins is the only
thing we need, but I don't know that to be the case.

I guess I would have imagined that a change like this would have to be
done in phases. In phase 1, we'd tell all of the backends that
shared_buffers had expanded to some new, larger value; but the new
buffers wouldn't be usable for anything yet. Then, once we confirmed
that everyone had the memo, we'd tell all the backends that those
buffers are now available for use. If shared_buffers were contracted,
phase 1 would tell all of the backends that shared_buffers had
contracted to some new, smaller value. Once a particular backend
learns about that, they will refuse to put any new pages into those
high-numbered buffers, but the existing contents would still be valid.
Once everyone has been told about this, we can go through and evict
all of those buffers, and then let everyone know that's done. Then
they shrink their mappings.

It looks to me like the patch doesn't expand the buffer mapping table,
which seems essential. But maybe I missed that.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michail Nikolaev 2024-11-25 19:38:00 Strange assertion in procarray.c
Previous Message Nathan Bossart 2024-11-25 19:29:31 Re: Large expressions in indexes can't be stored (non-TOASTable)