From: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
---|---|
To: | Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>, ashutosh(dot)bapat(at)enterprisedb(dot)com |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PATCH] LWLock self-deadlock detection |
Date: | 2020-11-25 13:23:47 |
Message-ID: | CAExHW5uB41T3QUg7o486R5o9GqVjsUaZR58PJqsTs0NqKMwvSQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Nov 25, 2020 at 11:47 AM Craig Ringer
<craig(dot)ringer(at)enterprisedb(dot)com> wrote:
>> I am also seeing a pattern
>> Assert(!LWLockHeldByMe());
>> LWLockAcquire()
>>
>> at some places. Should we change LWLockAcquire to do
>> Assert(!LWLockHeldByMe()) always to detect such occurrences?
>
>
> I'm inclined not to, at least not without benchmarking it, because that'd do the check before we attempt the fast-path. cassert builds are still supposed to perform decently and be suitable for day to day development and I'd rather not risk a slowdown.
>
> I'd prefer to make the lock self deadlock check run for production builds, not just cassert builds. It'd print a normal LOG (with backtrace if supported) then Assert(). So on an assert build we'd get a crash and core, and on a non-assert build we'd carry on and self-deadlock anyway.
>
> That's probably the safest thing to do. We can't expect to safely ERROR out of the middle of an LWLockAcquire(), not without introducing a new and really hard to test code path that'll also be surprising to callers. We probably don't want to PANIC the whole server for non-assert builds since it might enter a panic-loop. So it's probably better to self-deadlock. We could HINT that a -m immediate shutdown will be necessary to recover though.
I agree that it will be helpful to print something in the logs
indicating the reason for this hang in case the hang happens in a
production build. In your patch you have used ereport(PANIC, ) which
may simply be replaced by an Assert() in an assert enabled build. We
already have Assert(!LWLockHeldByMe()) so that should be safe. It will
be good to have -m immediate hint in LOG message. But it might just be
better to kill -9 that process to get rid of it. That will cause the
server to restart and not just shutdown.
--
Best Wishes,
Ashutosh Bapat
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2020-11-25 13:33:32 | Re: Online checksums patch - once again |
Previous Message | Daniel Gustafsson | 2020-11-25 13:20:42 | Re: Online checksums patch - once again |