Quick Links

Re: PANIC during exit on behalf of FATAL semop error

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Dave Vitek <dvitek(at)grammatech(dot)com>
Cc:	"pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject:	Re: PANIC during exit on behalf of FATAL semop error
Date:	2017-09-15 21:42:19
Message-ID:	20170915214219.qijlxmzjxuhdheic@alap3.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

Hi,

On 2017-09-15 17:30:51 -0400, Dave Vitek wrote:
> We have an x86_64 linux machine running postgresql 9.6.2. Our application
> uses LISTEN/NOTIFY. We recently made a change so that our testing
> infrastructure would notice postgres crashes and out popped this crash:

Oh. That's curious.

> elog(PANIC, "queueing for lock while waiting on another one");
>
> in this code:
>
> /*
> * Add ourselves to the end of the queue.
> *
> * NB: Mode can be LW_WAIT_UNTIL_FREE here!
> */
> static void
> LWLockQueueSelf(LWLock *lock, LWLockMode mode)
> {
>         /*
>          * If we don't have a PGPROC structure, there's no way to wait. This
>          * should never occur, since MyProc should only be null during
> shared
>          * memory initialization.
>          */
>         if (MyProc == NULL)
>                 elog(PANIC, "cannot wait without a PGPROC structure");
>
>         if (MyProc->lwWaiting)
> ----->       elog(PANIC, "queueing for lock while waiting on another one");
> <-------------------

> Here's a stack trace and a more verbose stack trace.
>
> #0 0x00007ff27bb50c37 in __GI_raise (sig=sig(at)entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> #1 0x00007ff27bb54028 in __GI_abort () at abort.c:89
> #2 0x0000000000978119 in errfinish (dummy=0) at elog.c:557
> #3 0x000000000097a671 in elog_finish (elevel=22, fmt=0xc5a000 "queueing for lock while waiting on another one") at elog.c:1378
> #4 0x000000000080eb73 in LWLockQueueSelf (lock=0x7ff272b06480, mode=LW_EXCLUSIVE) at lwlock.c:1035
> #5 0x000000000080ee30 in LWLockAcquire (lock=0x7ff272b06480, mode=LW_EXCLUSIVE) at lwlock.c:1250
> #6 0x00000000007fe416 in CleanupInvalidationState (status=1, arg=140679430974720) at sinvaladt.c:344
> #7 0x00000000007f5132 in shmem_exit (code=1) at ipc.c:261
> #8 0x00000000007f4f63 in proc_exit_prepare (code=1) at ipc.c:185
> #9 0x00000000007f4eb3 in proc_exit (code=1) at ipc.c:102
> #10 0x00000000009780e7 in errfinish (dummy=0) at elog.c:543
> #11 0x000000000097a671 in elog_finish (elevel=21, fmt=0xc45b4a "semop(id=%d) failed: %m") at elog.c:1378
> #12 0x00000000007881d9 in PGSemaphoreLock (sema=0x7ff27b7d6740) at pg_sema.c:391
> #13 0x000000000080ee74 in LWLockAcquire (lock=0x7ff272b06f00, mode=LW_SHARED) at lwlock.c:1287
> #14 0x0000000000604177 in asyncQueueReadAllNotifications () at async.c:1877
> #15 0x000000000060464c in ProcessIncomingNotify () at async.c:2058
> #16 0x0000000000603f04 in ProcessNotifyInterrupt () at async.c:1732
> #17 0x000000000081da4d in ProcessClientReadInterrupt (blocked=1 '\001') at postgres.c:537
> #18 0x00000000006d7a8d in secure_read (port=0x2bb90b0, ptr=0xfe8da0 <PqRecvBuffer>, len=8192) at be-secure.c:177
> #19 0x00000000006e3f82 in pq_recvbuf () at pqcomm.c:921
> #20 0x00000000006e4022 in pq_getbyte () at pqcomm.c:964
> #21 0x000000000081d495 in SocketBackend (inBuf=0x7ffe148694a0) at postgres.c:334
> #22 0x000000000081d9db in ReadCommand (inBuf=0x7ffe148694a0) at postgres.c:507
> #23 0x00000000008228ff in PostgresMain (argc=1, argv=0x2bbb308, dbname=0x2bbb2f0 "cshub", username=0x2bbb2d0 "cshubuser") at postgres.c:4021
> #24 0x000000000079f903 in BackendRun (port=0x2bb90b0) at postmaster.c:4272
> #25 0x000000000079ef9d in BackendStartup (port=0x2bb90b0) at postmaster.c:3946
> #26 0x000000000079b645 in ServerLoop () at postmaster.c:1701
> #27 0x000000000079ab97 in PostmasterMain (argc=3, argv=0x2b918b0) at postmaster.c:1309
> #28 0x00000000006e8f8c in main (argc=3, argv=0x2b918b0) at main.c:228
>
>
> So the PANIC is really happening because of the first problem:
>
> /*
> * PGSemaphoreLock
> *
> * Lock a semaphore (decrement count), blocking if count would be < 0
> */
> void
> PGSemaphoreLock(PGSemaphore sema)
> {

>
> if (errStatus < 0)
> ------> elog(FATAL, "semop(id=%d) failed: %m", sema->semId);
> <--------------
> }
>
> Should this one use PANIC instead of FATAL given that the FATAL exit path
> causes a PANIC in some cases? Is there an opportunity to repair the state
> of things enough that a FATAL exit is possible here?

I'm right now more curious to discover how this happened. Are you by any
chance running this with systemd/logind in the mix? It's RemoveIPC=
setting can cause such things...

Greetings,

Andres Freund

In response to

PANIC during exit on behalf of FATAL semop error at 2017-09-15 21:30:51 from Dave Vitek

Responses

Re: PANIC during exit on behalf of FATAL semop error at 2017-09-15 22:57:30 from Dave Vitek

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Dave Vitek	2017-09-15 22:57:30	Re: PANIC during exit on behalf of FATAL semop error
Previous Message	Dave Vitek	2017-09-15 21:30:51	PANIC during exit on behalf of FATAL semop error