Quick Links

Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, tender wang <tndrwang(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
Subject:	Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock
Date:	2023-12-18 17:04:07
Message-ID:	CA+TgmobAnXFzdA7qXTNxRqQyvzGPQz6e-mvRcDtALBNj+ir93A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Dec 12, 2023 at 8:29 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> The problem I see is that the group update mechanism is designed around
> contention of the global xact-SLRU control lock; it uses atomics to
> coordinate a single queue when the lock is contended. So if we split up
> the global SLRU control lock using banks, then multiple processes using
> different bank locks might not contend. OK, this is fine, but what
> happens if two separate groups of processes encounter contention on two
> different bank locks? I think they will both try to update the same
> queue, and coordinate access to that *using different bank locks*. I
> don't see how can this work correctly.
>
> I suspect the first part of that algorithm, where atomics are used to
> create the list without a lock, might work fine. But will each "leader"
> process, each of which is potentially using a different bank lock,
> coordinate correctly? Maybe this works correctly because only one
> process will find the queue head not empty? If this is what happens,
> then there needs to be comments about it. Without any explanation,
> this seems broken and potentially dangerous, as some transaction commit
> bits might become lost given high enough concurrency and bad luck.

I don't want to be dismissive of this concern, but I took a look at
this part of the patch set and I don't see a correctness problem. I
think the idea of the mechanism is that we have a single linked list
in shared memory that can accumulate those waiters. At some point a
process pops the entire list of waiters, which allows a new group of
waiters to begin accumulating. The process that pops the list must
perform the updates for every process in the just-popped list without
failing, else updates would be lost. In theory, there can be any
number of lists that some process has popped and is currently working
its way through at the same time, although in practice I bet it's
quite rare for there to be more than one. The only correctness problem
is if it's possible for a process that popped the list to error out
before it finishes doing the updates that it "promised" to do by
popping the list.

Having individual bank locks doesn't really change anything here.
That's just a matter of what lock has to be held in order to perform
the update that was promised, and the algorithm described in the
previous paragraph doesn't really care about that. Nor is there a
deadlock hazard here as long as processes only take one lock at a
time, which I believe is the case. The only potential issue that I see
here is with performance. I've heard some questions about whether this
machinery performs well even as it stands, but certainly if we divide
up the lock into a bunch of bankwise locks then that should tend in
the direction of making a mechanism like this less valuable, because
both mechanisms are trying to alleviate contention, and so in a
certain sense they are competing for the same job. However, they do
aim to alleviate different TYPES of contention: the group XID update
stuff should be most valuable when lots of processes are trying to
update the same page, and the banks should be most valuable when there
is simultaneous access to a bunch of different pages. So I'm not
convinced that this patch is a reason to remove the group XID update
mechanism, but someone might argue otherwise.

A related concern is that, if by chance we do end up with multiple
updaters from different pages in the same group, it will now be more
expensive to sort that out because we'll have to potentially keep
switching locks. So that could make the group XID update mechanism
less performant than it is currently. I think that's probably not an
issue because I think it should be a rare occurrence, as the comments
say. A bit more cost in a code path that is almost never taken won't
matter. But if that path is more commonly taken than I think, then
maybe making it more expensive could hurt. It might be worth adding
some debugging to see how often we actually go down that path in a
highly stressed system.

BTW:

+ * as well. The main reason for now allowing requesters of
different pages

now -> not

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock at 2023-12-12 13:28:51 from Alvaro Herrera

Responses

Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock at 2023-12-18 17:30:11 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jelte Fennema-Nio	2023-12-18 17:21:59	Re: Add --check option to pgindent
Previous Message	Nathan Bossart	2023-12-18 16:57:43	Re: optimize atomic exchanges