Re: Issue with the PRNG used by Postgres

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Parag Paul <parag(dot)paul(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Issue with the PRNG used by Postgres
Date: 2024-04-10 17:03:05
Message-ID: 4084246.1712768585@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> I don't think it's correct to re-initialize the SpinDelayStatus each
> time around the outer loop. That state should persist through the
> entire acquire operation, as it does in a regular spinlock acquire.
> As this stands, it resets the delay to minimum each time around the
> outer loop, and I bet it is that behavior not the RNG that's to blame
> for what he's seeing.

After thinking about this some more, it is fairly clear that that *is*
a mistake that can cause a thundering-herd problem. Assume we have
two or more backends waiting in perform_spin_delay, and for whatever
reason the scheduler wakes them up simultaneously. They see the
LW_FLAG_LOCKED bit clear (because whoever had the lock when they went
to sleep is long gone) and iterate the outer loop. One gets the lock;
the rest go back to sleep. And they will all sleep exactly
MIN_DELAY_USEC, because they all reset their SpinDelayStatus.
Lather, rinse, repeat. If the s_lock code were being used as
intended, they would soon acquire different cur_delay settings;
but that never happens. That is not the RNG's fault.

So I think we need something like the attached.

regards, tom lane

Attachment Content-Type Size
fix-improper-spin-delay-logic.patch text/x-diff 2.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2024-04-10 17:03:43 Re: Issue with the PRNG used by Postgres
Previous Message Andres Freund 2024-04-10 16:52:36 Re: Table AM Interface Enhancements