Re: issue with gininsert under very high load

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: issue with gininsert under very high load
Date: 2014-02-12 20:39:17
Message-ID: 52FBDBF5.2060800@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 02/12/2014 02:50 PM, Andres Freund wrote:
> On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote:
>> On investigation I found that a number of processes were locked waiting for
>> one wedged process to end its transaction, which never happened (this
>> transaction should normally take milliseconds). oprofile revealed that
>> postgres was spending 87% of its time in s_lock(), and strace on the wedged
>> process revealed that it was in a tight loop constantly calling select(). It
>> did not respond to a SIGTERM. I attached a debugger to the wedged process
>> and got this backtrace:
>>
>> #0 0x000000386a2eaf37 in semop () from /lib64/libc.so.6
>> #1 0x00000000006021a7 in PGSemaphoreLock ()
>> #2 0x00000000006463a1 in LWLockAcquire ()
>> #3 0x0000000000631997 in ReadBuffer_common ()
>> #4 0x00000000006322ee in ReadBufferExtended ()
>> #5 0x0000000000467475 in ginPrepareFindLeafPage ()
>> #6 0x000000000046663f in ginInsertItemPointers ()
>> #7 0x0000000000462adc in ginEntryInsert ()
>> #8 0x000000000046cad0 in ginInsertCleanup ()
>> #9 0x000000000046d7c6 in ginHeapTupleFastInsert ()
>> #10 0x0000000000462cfa in gininsert ()
>> #11 0x0000000000722b33 in FunctionCall6Coll ()
>> #12 0x000000000048bdff in index_insert ()
>> #13 0x0000000000587595 in ExecInsertIndexTuples ()
>> #14 0x00000000005948e1 in ExecModifyTable ()
>> #15 0x000000000057e718 in ExecProcNode ()
>> #16 0x000000000057d512 in standard_ExecutorRun ()
>> [...]
>>
>>
>>
>> Armed with this the client identified a single gin index (on an array of
>> text) that could have been involved, and on removing the index the problem
>> simply went away, and they have now been live on 9.3 for 36 hours without a
>> mishap.
>>
>> The client is attempting to create a self-contained reproducible test, but
>> understandably are not going to re-add the index to their production system.
>>
>> I'm not terribly familiar with the gin code, so I'm reporting this in the
>> hope that other people who are more familiar with it than I am might know
>> where to look for a potential race condition or other bug that might have
>> caused this.
> That's a deficiency of the gin fastupdate cache: a) it bases it's size
> on work_mem which usually makes it *far* too big b) it doesn't perform the
> cleanup in one go if it can get a suitable lock, but does independent
> locking for each entry. That usually leads to absolutely horrific
> performance under concurreny.
>
> The fix is usually to just turn FASTUPDATE off for all indexes. If it's
> done after the initial creation, the table should be vacuumed afterwards
> to flush it.
>
> There's previously been talk about changing the limits to something more
> reasonable but it got stalled in bikeshedding IIRC.
>

So this doesn't work just when it might be most useful?

cheers

andrew

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2014-02-12 20:40:44 Re: truncating pg_multixact/members
Previous Message Tom Lane 2014-02-12 20:33:38 Re: issue with gininsert under very high load