From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: backends stuck in "startup" |
Date: | 2017-11-22 18:27:12 |
Message-ID: | 27867.1511375232@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Justin Pryzby <pryzby(at)telsasoft(dot)com> writes:
> On Tue, Nov 21, 2017 at 03:40:27PM -0800, Andres Freund wrote:
>> Could you try stracing next time?
> I straced all the "startup" PIDs, which were all in futex, without exception:
If you've got debug symbols installed, could you investigate the states
of the LWLocks the processes are stuck on?
My hypothesis about a missed memory barrier would imply that there's (at
least) one process that's waiting but is not in the lock's wait queue and
has MyProc->lwWaiting == false, while the rest are in the wait queue and
have MyProc->lwWaiting == true. Actually chasing through the list
pointers would be slightly tedious, but checking MyProc->lwWaiting,
and maybe MyProc->lwWaitMode, in each process shouldn't be too hard.
Also verify that they're all waiting for the same LWLock (by address).
I recognize Andres' point that on x86 lock-prefixed instructions should
be full memory barriers, and at least on my Linux machines, there do
seem to be lock-prefixed instructions in the fast paths through sem_wait
and sem_post. But the theory fits the reported evidence awfully well,
and we have no other theory that fits at all.
[ in an earlier post: ]
> BTW this is a VM run on a hypervisor managed by our customer:
> DMI: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 06/22/2012
Hmm. Can't avoid the suspicion that that's relevant somehow.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Pryzby | 2017-11-22 18:52:10 | Re: backends stuck in "startup" |
Previous Message | Carl Karsten | 2017-11-22 17:02:44 | Re: migrations (was Re: To all who wish to unsubscribe) |