From: | Andy Fan <zhihuifan1213(at)163(dot)com> |
---|---|
To: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <tmunro(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Re: the s_lock_stuck on perform_spin_delay |
Date: | 2024-01-04 14:24:50 |
Message-ID: | 871qaxp3ly.fsf@163.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Matthias and Robert,
Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> writes:
> On Thu, 4 Jan 2024 at 08:09, Andy Fan <zhihuifan1213(at)163(dot)com> wrote:
>>
>> My question is if someone doesn't obey the rule by mistake (everyone
>> can make mistake), shall we PANIC on a production environment? IMO I
>> think it can be a WARNING on a production environment and be a stuck
>> when 'ifdef USE_ASSERT_CHECKING'.
>> [...]
>> I think a experienced engineer like Thomas can make this mistake and the
>> patch was reviewed by 3 peoples, the bug is still there. It is not easy
>> to say just don't do it.
>>
>> the attached code show the prototype in my mind. Any feedback is welcome.
>
> While I understand your point and could maybe agree with the change
> itself (a crash isn't great),
It's great that both of you agree that the crash is not great.
> I don't think it's an appropriate fix
> for the problem of holding a spinlock while waiting for a LwLock, as
> spin.h specifically mentions the following (and you quoted the same):
>
> """
> Keep in mind the coding rule that spinlocks must not be held for more
> than a few instructions.
> """
Yes, I agree that the known [LW]LockAcquire after holding a Spin lock
should be fixed at the first chance rather than pander to it with my
previous patch. My previous patch just take care of the *unknown*
cases (and I cced thomas in the hope that he knows the bug). I also
agree that the special case about [LW]LockAcquire should be detected
more effective as you suggested below. So v2 comes and commit 2 is for
this suggestion.
>
> I suspect that we'd be better off with stronger protections against
> waiting for LwLocks while we hold any spin lock. More specifically,
> I'm thinking about something like tracking how many spin locks we
> hold, and Assert()-ing that we don't hold any such locks when we start
> to wait for an LwLock or run CHECK_FOR_INTERRUPTS-related code (with
> potential manual contextual overrides in specific areas of code where
> specific care has been taken to make it safe to hold spin locks while
> doing those operations - although I consider their existence unlikely
> I can't rule them out as I've yet to go through all lock-touching
> code). This would probably work in a similar manner as
> START_CRIT_SECTION/END_CRIT_SECTION.
>
> Kind regards,
>
> Matthias van de Meent
> Neon (https://neon.tech)
--
Best Regards
Andy Fan
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Don-t-call-s_lock_stuck-in-production-environment.patch | text/x-diff | 1.7 KB |
v2-0002-Disallow-LW-LockAcquire-when-holding-a-spin-lock-.patch | text/x-diff | 3.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2024-01-04 14:55:01 | Re: index prefetching |
Previous Message | Bertrand Drouvot | 2024-01-04 13:54:44 | Re: Synchronizing slots from primary to standby |