Re: Lots of stuck queries after upgrade to 9.4

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Spiros Ioannou <sivann(at)inaccess(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: Lots of stuck queries after upgrade to 9.4
Date: 2015-07-20 12:21:18
Message-ID: 55ACE7BE.7010804@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 07/20/2015 03:01 PM, Andres Freund wrote:
> Heikki,
>
> On 2015-07-20 13:27:12 +0200, Andres Freund wrote:
>> On 2015-07-20 13:22:42 +0200, Andres Freund wrote:
>>> Hm. The problem seems to be the WaitXLogInsertionsToFinish() call in
>>> XLogFlush().
>>
>> These are the relevant stack traces:
>> db9lock/debuglog-commit.txt
>> #2 0x00007f7405bd44f4 in LWLockWaitForVar (l=0x7f70f2ab6680, valptr=0x7f70f2ab66a0, oldval=<optimized out>, newval=0xffffffffffffffff) at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/storage/lmgr/lwlock.c:1011
>> #3 0x00007f7405a0d3e6 in WaitXLogInsertionsToFinish (upto=121713318915952) at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:1755
>> #4 0x00007f7405a0e1d3 in XLogFlush (record=121713318911056) at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:2849
>>
>> db9lock/debuglog-insert-8276.txt
>> #1 0x00007f7405b77d91 in PGSemaphoreLock (sema=0x7f73ff6531d0, interruptOK=0 '\000') at pg_sema.c:421
>> #2 0x00007f7405bd4849 in LWLockAcquireCommon (val=<optimized out>, valptr=<optimized out>, mode=<optimized out>, l=<optimized out>) at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/storage/lmgr/lwlock.c:626
>> #3 LWLockAcquire (l=0x7f70ecaaa1a0, mode=LW_EXCLUSIVE) at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/storage/lmgr/lwlock.c:467
>> #4 0x00007f7405a0dcca in AdvanceXLInsertBuffer (upto=<optimized out>, opportunistic=<optimized out>) at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:2161
>> #5 0x00007f7405a0e301 in GetXLogBuffer (ptr=121713318928384) at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:1848
>> #6 0x00007f7405a0e9c9 in CopyXLogRecordToWAL (EndPos=<optimized out>, StartPos=<optimized out>, rdata=0x7ffff1c21b90, isLogSwitch=<optimized out>, write_len=<optimized out>) at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:1494
>> #7 XLogInsert (rmid=<optimized out>, info=<optimized out>, rdata=<optimized out>) at /tmp/buildd/postgre
>
>
> XLogFlush() has the following comment:
> /*
> * Re-check how far we can now flush the WAL. It's generally not
> * safe to call WaitXLogInsertionsToFinish while holding
> * WALWriteLock, because an in-progress insertion might need to
> * also grab WALWriteLock to make progress. But we know that all
> * the insertions up to insertpos have already finished, because
> * that's what the earlier WaitXLogInsertionsToFinish() returned.
> * We're only calling it again to allow insertpos to be moved
> * further forward, not to actually wait for anyone.
> */
> insertpos = WaitXLogInsertionsToFinish(insertpos);
>
> but I don't think that's valid reasoning. WaitXLogInsertionsToFinish()
> calls LWLockWaitForVar(oldval = InvalidXLogRecPtr), which will block if
> there's a exlusive locker and some backend doesn't yet have set
> initializedUpto. Which seems like a ossible state?

A backend always updates its insert position before sleeping/acquiring
another lock, by calling WALInsertLockUpdateInsertingAt. So even though
another backend might indeed be in the
initializedUpto==InvalidXlogRecPtr state, it will get out of that state
before either by releasing the lock or updating initializedUpto, before
it will in turn do anything that might deadlock.

Clearly there's *something* wrong here, though, given the bug report...

- Heikki

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Paquier 2015-07-20 12:53:26 Re: Postgres Recovery
Previous Message Andres Freund 2015-07-20 12:01:42 Re: Lots of stuck queries after upgrade to 9.4