Re: Lots of stuck queries after upgrade to 9.4

From: Spiros Ioannou <sivann(at)inaccess(dot)com>
To: hlinnaka(at)iki(dot)fi
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: Lots of stuck queries after upgrade to 9.4
Date: 2015-07-20 12:58:33
Message-ID: CACKh8C8zdYoGSv_qougL7Xng7TS=a-2ombNm-wGkoEJ2ow-hfA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Happened again, another backtrace from a COMMIT process. I changed the
commit_delay to 0 (it was 4000 to help with our storage) and will report
back.

Already logging to debuglog.txt.
#0 0x00007f47b0789ec7 in semop () from /lib/libc.so.6
#1 0x00007f47b2513d91 in PGSemaphoreLock (sema=0x7f47abfe7ed0,
interruptOK=0 '\000') at pg_sema.c:421
#2 0x00007f47b25704f4 in LWLockWaitForVar (l=0x7f449f452680,
valptr=0x7f449f4526a0, oldval=<optimized out>, newval=0xffffffffffffffff)
at
/tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/storage/lmgr/lwlock.c:1011
#3 0x00007f47b23a93e6 in WaitXLogInsertionsToFinish (upto=121780827906336)
at
/tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:1755
#4 0x00007f47b23aa1d3 in XLogFlush (record=121780827898328) at
/tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:2849
#5 0x00007f47b2398f17 in RecordTransactionCommit () at
/tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xact.c:1196
#6 0x00007f47b239921b in CommitTransaction () at
/tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xact.c:1953
#7 0x00007f47b239a8e5 in CommitTransactionCommand () at
/tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xact.c:2732
#8 0x00007f47b257c105 in finish_xact_command () at
/tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/tcop/postgres.c:2437
#9 0x00007f47b2580a10 in exec_execute_message (max_rows=<optimized out>,
portal_name=<optimized out>) at
/tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/tcop/postgres.c:1974
#10 PostgresMain (argc=<optimized out>, argv=<optimized out>,
dbname=0x7f47b49b5730 "ifms_db", username=<optimized out>) at
/tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/tcop/postgres.c:4137
#11 0x00007f47b25280f5 in BackendRun (port=<optimized out>) at
/tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/postmaster/postmaster.c:4164
#12 BackendStartup (port=<optimized out>) at
/tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/postmaster/postmaster.c:3829
#13 ServerLoop () at
/tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/postmaster/postmaster.c:1597
#14 PostmasterMain (argc=<optimized out>, argv=<optimized out>) at
/tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/postmaster/postmaster.c:1244
#15 0x00007f47b24ba474 in main (argc=5, argv=0x7f47b49b4920) at
/tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/main/main.c:228
Detaching from program: /usr/lib/postgresql/9.4/bin/postgres, process 15601

*Spiros Ioannou IT Manager, inAccesswww.inaccess.com
<http://www.inaccess.com>M: +30 6973-903808T: +30 210-6802-358*

On 20 July 2015 at 15:21, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

> On 07/20/2015 03:01 PM, Andres Freund wrote:
>
>> Heikki,
>>
>> On 2015-07-20 13:27:12 +0200, Andres Freund wrote:
>>
>>> On 2015-07-20 13:22:42 +0200, Andres Freund wrote:
>>>
>>>> Hm. The problem seems to be the WaitXLogInsertionsToFinish() call in
>>>> XLogFlush().
>>>>
>>>
>>> These are the relevant stack traces:
>>> db9lock/debuglog-commit.txt
>>> #2 0x00007f7405bd44f4 in LWLockWaitForVar (l=0x7f70f2ab6680,
>>> valptr=0x7f70f2ab66a0, oldval=<optimized out>, newval=0xffffffffffffffff)
>>> at
>>> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/storage/lmgr/lwlock.c:1011
>>> #3 0x00007f7405a0d3e6 in WaitXLogInsertionsToFinish
>>> (upto=121713318915952) at
>>> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:1755
>>> #4 0x00007f7405a0e1d3 in XLogFlush (record=121713318911056) at
>>> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:2849
>>>
>>> db9lock/debuglog-insert-8276.txt
>>> #1 0x00007f7405b77d91 in PGSemaphoreLock (sema=0x7f73ff6531d0,
>>> interruptOK=0 '\000') at pg_sema.c:421
>>> #2 0x00007f7405bd4849 in LWLockAcquireCommon (val=<optimized out>,
>>> valptr=<optimized out>, mode=<optimized out>, l=<optimized out>) at
>>> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/storage/lmgr/lwlock.c:626
>>> #3 LWLockAcquire (l=0x7f70ecaaa1a0, mode=LW_EXCLUSIVE) at
>>> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/storage/lmgr/lwlock.c:467
>>> #4 0x00007f7405a0dcca in AdvanceXLInsertBuffer (upto=<optimized out>,
>>> opportunistic=<optimized out>) at
>>> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:2161
>>> #5 0x00007f7405a0e301 in GetXLogBuffer (ptr=121713318928384) at
>>> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:1848
>>> #6 0x00007f7405a0e9c9 in CopyXLogRecordToWAL (EndPos=<optimized out>,
>>> StartPos=<optimized out>, rdata=0x7ffff1c21b90, isLogSwitch=<optimized
>>> out>, write_len=<optimized out>) at
>>> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:1494
>>> #7 XLogInsert (rmid=<optimized out>, info=<optimized out>,
>>> rdata=<optimized out>) at /tmp/buildd/postgre
>>>
>>
>>
>> XLogFlush() has the following comment:
>> /*
>> * Re-check how far we can now flush the WAL.
>> It's generally not
>> * safe to call WaitXLogInsertionsToFinish while
>> holding
>> * WALWriteLock, because an in-progress insertion
>> might need to
>> * also grab WALWriteLock to make progress. But
>> we know that all
>> * the insertions up to insertpos have already
>> finished, because
>> * that's what the earlier
>> WaitXLogInsertionsToFinish() returned.
>> * We're only calling it again to allow insertpos
>> to be moved
>> * further forward, not to actually wait for
>> anyone.
>> */
>> insertpos = WaitXLogInsertionsToFinish(insertpos);
>>
>> but I don't think that's valid reasoning. WaitXLogInsertionsToFinish()
>> calls LWLockWaitForVar(oldval = InvalidXLogRecPtr), which will block if
>> there's a exlusive locker and some backend doesn't yet have set
>> initializedUpto. Which seems like a ossible state?
>>
>
> A backend always updates its insert position before sleeping/acquiring
> another lock, by calling WALInsertLockUpdateInsertingAt. So even though
> another backend might indeed be in the initializedUpto==InvalidXlogRecPtr
> state, it will get out of that state before either by releasing the lock or
> updating initializedUpto, before it will in turn do anything that might
> deadlock.
>
> Clearly there's *something* wrong here, though, given the bug report...
>
> - Heikki
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andres Freund 2015-07-20 13:04:13 Re: Lots of stuck queries after upgrade to 9.4
Previous Message Michael Paquier 2015-07-20 12:53:26 Re: Postgres Recovery