Re: [PATCH] LWLock self-deadlock detection

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] LWLock self-deadlock detection
Date: 2020-11-24 14:11:26
Message-ID: CAExHW5sMC5heAJ_r_jBgwp48tgq-MPZg0tKLgzHO3yV6gt-jeQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

This looks useful. LWLockCheckSelfDeadlock() could use LWLockHeldByMe
variant instead of copying that code with possibly a change in that
function to return the required information.

I am also seeing a pattern
Assert(LWLockHeldByMe*())
LWLockAcquire()

at some places. Should we change LWLockAcquire to do
Assert(LWLockHelpByMe()) always to detect such occurrences? Enhance
that pattern to print the information that your patch prints?

It looks weird that we can detect a self deadlock but not handle it.
But handling it requires remembering multiple LWLocks held and then
release them those many times. That may not leave LWLocks LW anymore.

On Thu, Nov 19, 2020 at 4:02 PM Craig Ringer
<craig(dot)ringer(at)enterprisedb(dot)com> wrote:
>
> Hi all
>
> Here's a patch I wrote a while ago to detect and report when a LWLockAcquire() results in a simple self-deadlock due to the caller already holding the LWLock.
>
> To avoid affecting hot-path performance, it only fires the check on the first iteration through the retry loops in LWLockAcquire() and LWLockWaitForVar(), and just before we sleep, once the fast-path has been missed.
>
> I wrote an earlier version of this when I was chasing down some hairy issues with background workers deadlocking on some exit paths because ereport(ERROR) or elog(ERROR) calls fired when a LWLock was held would cause a before_shmem_exit or on_shmem_exit cleanup function to deadlock when it tried to acquire the same lock.
>
> But it's an easy enough mistake to make and a seriously annoying one to track down, so I figured I'd post it for consideration. Maybe someone else will get some use out of it even if nobody likes the idea of merging it.
>
> As written the check runs only for --enable-cassert builds or when LOCK_DEBUG is defined.

--
Best Wishes,
Ashutosh Bapat

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2020-11-24 14:12:19 Re: Prevent printing "next step instructions" in initdb and pg_upgrade
Previous Message Bruce Momjian 2020-11-24 14:09:06 Re: Migration Oracle multitenant database to PostgreSQL ?