Quick Links

Re: MAX_BACKENDS size (comment accuracy)

From:	Jacob Brazeal <jacob(dot)brazeal(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: MAX_BACKENDS size (comment accuracy)
Date:	2025-02-09 20:41:58
Message-ID:	CA+COZaCx_yG+mqw-RxzNye9cX+3Kj0-vaycmg0kPp4O_o4m_bQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> Halfing the size of LWLock and laying
> the ground work for making the wait-list lock-free imo would be very well
> worth the reduction in an unrealistic limit...

BTW, I spent a week or two researching the lock-free queue idea,
specifically looking at what it might look like to have a Michael-Scott
type lock-free queue adapted to the waitlist. In the end, it seems like a
fairly challenging project, and probably beyond my powers, but I'm very
curious if you had some ideas around it that I'm missing. Here is a summary
of my digging:

The proclist has several advantages. We can always store a waitlist entry
for a given process in the same spot in shared memory, so there's no memory
allocation or cleanup to worry about. It's a doubly-linked list and hence
easy to remove items from the middle of the list (given a lock is acquired).

A Michael-Scott style lock-free queue would require giving up these
advantages:
* When you pop a node from a queue, it is repurposed into a temporary
"dummy" node at the head of the queue. This means that if we want to add a
process to a new queue, we can't reuse the memory from its previous queue
node, because it might be in use as a dummy node. So we can't store
waitlist entries in PGPROCs anymore; we'd probably want some kind of
freelist. This also means that we need to eventually free nodes that have
been popped, and the next time they're used, it might be by a different
process.
* The queue only supports popping from the head. So we can't directly
support the current logic, which can skip some nodes in the waitlist to
wakeup others. Two possibles workarounds I see: 1) split the waitlist into
3 different queues, one for each type (shared, exclusive, wait for val) -
or have a lock-free "waitlist for the waitlist" that accumulates new
entries and then we take out a lock when we actually want to release nodes.

Finally, we'd need to set up some mechanism for safely freeing queue nodes
when we're done with them, like hazard pointers or maybe epoch-based
reclamation, which looks like a slightly better fit to me for our case, but
both mechanisms are fairly tricky and I think we'd have to seriously
consider finding some existing implementation that's well-tested and
compatibly licensed instead of writing our own. (Or, maybe writing and
testing our own version is the actual heart of this project.)

So, as I say, in the end this feels beyond my powers, but it was very
interesting to look into and I'm curious if you had some notes lying around
on it.

Regards,
Jacob

In response to

Re: MAX_BACKENDS size (comment accuracy) at 2025-01-26 21:08:19 from Jacob Brazeal

Responses

Re: MAX_BACKENDS size (comment accuracy) at 2025-02-12 15:28:42 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Pavel Stehule	2025-02-09 20:56:54	Re: Re: proposal: schema variables
Previous Message	Jacob Brazeal	2025-02-09 20:17:27	Re: Confusing variable naming in LWLockRelease