From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Parag Paul <parag(dot)paul(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Issue with the PRNG used by Postgres |
Date: | 2024-04-10 17:12:21 |
Message-ID: | 4085126.1712769141@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Parag Paul <parag(dot)paul(at)gmail(dot)com> writes:
> Yes, the probability of this happening is astronomical, but in production
> with 128 core servers with 7000 max_connections, with petabyte scale data,
> this did repro 2 times in the last month. We had to move to a local
> approach to manager our ratelimiting counters.
> This is not reproducible very easily. I feel that we should at least shield
> ourselves with the following change, so that we at least increase the delay
> by 1000us every time. We will follow a linear back off, but better than no
> backoff.
I still say you are proposing to band-aid the wrong thing. Moreover:
* the proposed patch will cause the first few cur_delay values to grow
much faster than before, with direct performance impact to everyone,
whether they are on 128-core servers or not;
* if we are in a regime where xoroshiro repeatedly returns zero
across multiple backends, your patch doesn't improve the situation
AFAICS, because the backends will still choose the same series
of cur_delay values and thus continue to exhibit thundering-herd
behavior. Indeed, as coded I think the patch makes it *more*
likely that the same series of cur_delay values would be chosen
by multiple backends.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2024-04-10 17:14:16 | Re: Improve eviction algorithm in ReorderBuffer |
Previous Message | Andres Freund | 2024-04-10 17:08:46 | Re: Issue with the PRNG used by Postgres |