Re: Issue with the PRNG used by Postgres

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Parag Paul <parag(dot)paul(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Issue with the PRNG used by Postgres
Date: 2024-04-11 21:30:23
Message-ID: 20240411213023.zd2776ahyz5xzyhw@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2024-04-11 16:46:23 -0400, Robert Haas wrote:
> On Thu, Apr 11, 2024 at 3:52 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > My suspicion is that most of the false positives are caused by lots of signals
> > interrupting the pg_usleep()s. Because we measure the number of delays, not
> > the actual time since we've been waiting for the spinlock, signals
> > interrupting pg_usleep() trigger can very significantly shorten the amount of
> > time until we consider a spinlock stuck. We should fix that.
>
> I mean, go nuts. But <dons asbestos underpants, asbestos regular
> pants, 2 pair of asbestos socks, 3 asbestos shirts, 2 asbestos
> jackets, and then hides inside of a flame-proof capsule at the bottom
> of the Pacific ocean> this is just another thing like query hints,
> where everybody says "oh, the right thing to do is fix X or Y or Z and
> then you won't need it". But of course it never actually gets fixed
> well enough that people stop having problems in the real world. And
> eventually we look like a developer community that cares more about
> our own opinion about what is right than what the experience of real
> users actually is.

I don't think that's a particularly apt comparison. If you have spinlocks that
cannot be acquired within tens of seconds, you're in a really bad situation,
regardless of whether you crash-restart or not.

Whereas with hints, you might actually be operating perfectly normally when
using hints. Never using the wrong plan is also just an order of magnitude
harder and fuzzier problem than ensuring we don't wait for spinlocks for a
long time.

> In all seriousness, I'd really like to understand what experience
> you've had that makes this check seem useful. Because I think all of
> my experiences with it have been bad. If they weren't, the last good
> one was a very long time ago.

By far the most of the stuck spinlocks I've seen were due to bugs in
out-of-core extensions. Absurdly enough, the next common thing probably is due
to people using gdb to make an uninterruptible process break out of some code,
without a crash-restart, accidentally doing so while a spinlock is held.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2024-04-11 21:48:12 Re: post-freeze damage control
Previous Message Cary Huang 2024-04-11 21:24:00 Re: [Patch] add multiple client certificate selection feature