Re: backends stuck in "startup"

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-general(at)postgresql(dot)org
Subject: Re: backends stuck in "startup"
Date: 2017-11-23 00:43:50
Message-ID: 14525.1511397830@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Justin Pryzby <pryzby(at)telsasoft(dot)com> writes:
> For starters, I found that PID 27427 has:

> (gdb) p proc->lwWaiting
> $1 = 0 '\000'
> (gdb) p proc->lwWaitMode
> $2 = 1 '\001'

To confirm, this is LWLockAcquire's "proc", equal to MyProc?
If so, and if LWLockAcquire is blocked at PGSemaphoreLock,
that sure seems like a smoking gun.

> Note: I've compiled locally PG 10.1 with PREFERRED_SEMAPHORES=SYSV to keep the
> service up (and to the degree that serves to verify that avoids the issue,
> great).

Good idea, I was going to suggest that. It will be very interesting
to see if that makes the problem go away.

> Would you suggest how I can maximize the likelyhood/speed of triggering that ?
> Five years ago, with a report of similar symptoms, you said "You need to hack
> pgbench to suppress the single initialization connection it normally likes to
> make, else the test degenerates to the one-incoming-connection case"
> https://www.postgresql.org/message-id/8896.1337998337%40sss.pgh.pa.us

I don't think that case was related at all.

My theory suggests that any contended use of an LWLock is at risk,
in which case just running pgbench with about as many sessions as
you have in the live server ought to be able to trigger it. However,
that doesn't really account for your having observed the problem
only during session startup, so there may be some other factor
involved. I wonder if it only happens during the first wait for
an LWLock ... and if so, how could that be?

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2017-11-23 00:57:36 Re: query causes connection termination
Previous Message Tomas Vondra 2017-11-23 00:32:20 Re: query causes connection termination