From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Parag Paul <parag(dot)paul(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Issue with the PRNG used by Postgres |
Date: | 2024-04-12 05:05:05 |
Message-ID: | a0d9a842-b301-c200-6ebb-b058d7e84c9e@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Andres,
12.04.2024 07:41, Andres Freund wrote:
>
> FWIW, I just reproduced the scenario with signals. I added tracking of the
> total time actually slept and lost to SpinDelayStatus, and added a function to
> trigger a wait on a spinlock.
>
> To wait less, I set max_standby_streaming_delay=0.1, but that's just for
> easier testing in isolation. In reality that could have been reached before
> the spinlock is even acquired.
>
> On a standby, while a recovery conflict is happening:
> PANIC: XX000: stuck spinlock detected at crashme, path/to/file:line, after 4.38s, lost 127.96s
>
>
> So right now it's really not hard to trigger the stuck-spinlock logic
> completely spuriously. This doesn't just happen with hot standby, there are
> plenty other sources of lots of signals being sent.
I managed to trigger that logic when trying to construct a reproducer
for bug #18426.
With the following delays added:
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1776,6 +1776,7 @@ retry:
*/
if (BUF_STATE_GET_REFCOUNT(buf_state) != 0)
{
+pg_usleep(300000L);
UnlockBufHdr(buf, buf_state);
LWLockRelease(oldPartitionLock);
/* safety check: should definitely not be our *own* pin */
@@ -5549,6 +5550,7 @@ TerminateBufferIO(BufferDesc *buf, bool clear_dirty, uint32 set_flag_bits,
Assert(buf_state & BM_IO_IN_PROGRESS);
+pg_usleep(300);
buf_state &= ~(BM_IO_IN_PROGRESS | BM_IO_ERROR);
if (clear_dirty && !(buf_state & BM_JUST_DIRTIED))
buf_state &= ~(BM_DIRTY | BM_CHECKPOINT_NEEDED);
and /tmp/temp.config:
bgwriter_delay = 10
TEMP_CONFIG=/tmp/temp.config make -s check -C src/test/recovery PROVE_TESTS="t/032*"
fails for me on iterations 22, 23, 37:
2024-04-12 05:00:17.981 UTC [762336] PANIC: stuck spinlock detected at WaitBufHdrUnlocked, bufmgr.c:5726
I haven't investigated this case yet.
Best regards,
Alexander
From | Date | Subject | |
---|---|---|---|
Next Message | Andrei Lepikhov | 2024-04-12 05:05:14 | Re: POC: GROUP BY optimization |
Previous Message | Andres Freund | 2024-04-12 04:41:39 | Re: Issue with the PRNG used by Postgres |