Re: the s_lock_stuck on perform_spin_delay

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Andy Fan <zhihuifan1213(at)163(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <tmunro(at)postgresql(dot)org>
Subject: Re: the s_lock_stuck on perform_spin_delay
Date: 2024-01-04 10:23:48
Message-ID: CAEze2Wj6TKjmF0Qscoo3j5ANpDSKa86ELU6Bv4T_x16xKrzHxQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 4 Jan 2024 at 08:09, Andy Fan <zhihuifan1213(at)163(dot)com> wrote:
>
> My question is if someone doesn't obey the rule by mistake (everyone
> can make mistake), shall we PANIC on a production environment? IMO I
> think it can be a WARNING on a production environment and be a stuck
> when 'ifdef USE_ASSERT_CHECKING'.
> [...]
> I think a experienced engineer like Thomas can make this mistake and the
> patch was reviewed by 3 peoples, the bug is still there. It is not easy
> to say just don't do it.
>
> the attached code show the prototype in my mind. Any feedback is welcome.

While I understand your point and could maybe agree with the change
itself (a crash isn't great), I don't think it's an appropriate fix
for the problem of holding a spinlock while waiting for a LwLock, as
spin.h specifically mentions the following (and you quoted the same):

"""
Keep in mind the coding rule that spinlocks must not be held for more
than a few instructions.
"""

I suspect that we'd be better off with stronger protections against
waiting for LwLocks while we hold any spin lock. More specifically,
I'm thinking about something like tracking how many spin locks we
hold, and Assert()-ing that we don't hold any such locks when we start
to wait for an LwLock or run CHECK_FOR_INTERRUPTS-related code (with
potential manual contextual overrides in specific areas of code where
specific care has been taken to make it safe to hold spin locks while
doing those operations - although I consider their existence unlikely
I can't rule them out as I've yet to go through all lock-touching
code). This would probably work in a similar manner as
START_CRIT_SECTION/END_CRIT_SECTION.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-01-04 11:04:13 Re: speed up a logical replica setup
Previous Message vignesh C 2024-01-04 10:09:38 Re: pg_upgrade and logical replication