Re: BUG #13657: Some kind of undetected deadlock between query and "startup process" on replica.

From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13657: Some kind of undetected deadlock between query and "startup process" on replica.
Date: 2015-10-02 08:52:23
Message-ID: CAK-MWwR9q1EKh5=R7oSPqHgqu-uOcVRfjpxm4eFM_Jao7N9s4Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Oct 2, 2015 at 4:58 PM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:

>
>
> On Fri, Oct 2, 2015 at 2:14 PM, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com> wrote:
>
>> ​>​
>> This backtrace is not indicating that this process is waiting on a
>> relation lock, it is resolving a recovery conflict while removing tuples,
>> killing the virtual transaction depending on if max_standby_streaming_delay
>> or max_standby_archive_delay are set if the conflict gets longer. Did you
>> change the default of those parameters, which is 30s, to -1? This would
>> mean that the standby waits indefinitely.
>>
>>
>> ​Problem that startup process have confict with a query, which blocked
>> (waiting for) on the startup process itself (query could not process
>> because it waiting for lock which held by startup process, and startup
>> process waiting for finishing this query). So it's an undetected deadlock
>> condtion here (as I understand situation). ​
>>
>> PS: there are no other activity on the database during that problem
>> except blocked query.
>>
>
> Don't you have other queries running in parallel of the one you are
> defining as "stuck" on the standby that prevent replay to move on? Like a
> long-running transaction working on the relation involved? Are you sure
> that you did not set up
> ​​
> max_standby_streaming_delay to -1?
> --
> Michael
>

During the problem period on the database had runned only one query (listed
in intial report) and nothing more (and this query had beed in waiting
state according to pg_stat_activity).
The pg_locks show that the query waiting for AccessShareLock on relation
17987, in the same time the startup process have AccessExclusiveLock on the
same relation and waiting for something. No other activity on the replica
going on.
And yes, the​ max_standby_streaming_delay to -1, as a result the
replication process had been stuck on query from external monitoring tool
forever until I killed that query, but situation repeated in few hours
again.

--
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ <http://www.postgresql-consulting.com/>

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim(dot)boguk(at)gmail(dot)com
МойКруг: http://mboguk.moikrug.ru/

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message kmursk 2015-10-02 09:32:27 BUG #13661: Using word LIMIT
Previous Message Michael Paquier 2015-10-02 06:58:53 Re: BUG #13657: Some kind of undetected deadlock between query and "startup process" on replica.