From: | Bruno Richard <bruno(dot)richard(at)broadsign(dot)com> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Pg Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: PostgreSQL hot standby Hangs due to AccessExclusiveLock on pg_attribute or pg_type tables |
Date: | 2017-07-11 20:48:11 |
Message-ID: | CAB-EU3St=O+FoXt-i8OkCp3A36Xft6TTOX6DO1okp=0Ub47n5w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Thanks for the replies.
Jeff, there were no long-lived transactions in pg_stat_activity while
pg_attribute was locked. We were able to query it from another database on
the same instance.
I can't say if it would correct after 20 minutes since it causes us major
issues in our application.
Five days ago, we tried the hot_standy_feedback=on setting on one of our
hot standbys and we don't get locks anymore. We even don't get locks from
hot standbys not having the setting enabled. We will enable it on those
later this week.
Thanks!
On Tue, Jul 11, 2017 at 12:39 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Tue, Jul 11, 2017 at 3:01 AM, Kyotaro HORIGUCHI <
> horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
>> Hello.
>>
>> In conclusion, this doesn't seem to be a bug but just a standby
>> conflict.
>>
>> At Mon, 19 Jun 2017 13:29:34 -0400, Bruno Richard <
>> bruno(dot)richard(at)broadsign(dot)com> wrote in <CAB-EU3RawZx8-OzMfvswFf6z+Y7G
>> OZf03TZ=bez+PbQX+A4M7Q(at)mail(dot)gmail(dot)com>
>> > *Summary:*
>> > PostgreSQL hot standby Hangs due to AccessExclusiveLock on pg_attribute
>> or
>> > pg_type tables
>> >
>> > *Platform information (all instances)*
>> > PostgreSQL 9.6.2, ltree extension enabled.
>> > Ubuntu 12.04 64bits
>> >
>> > *Description:*
>> > We have an issue on our Database system since we migrated from
>> PostgreSQL
>> > 9.3.2 to 9.6.2.
>> > The issue is that one (or many) of our hot standby instances does an
>> > AccessExclusiveLock on the pg_attribute or pg_type system tables that
>> never
>> > gets unlocked, causing hangs on our systems. We have to restart the
>> > instance to get out of the lock. It happened ~30 times since end of
>> April.
>> >
>> > *Steps to reproduce:*
>> > Looks like a race condition, happens randomly in production only. I am
>> > trying to replicate in house.
>> >
>> > *Description:*
>> > This is the output from the pg_locks table when the issue occurs:
>> >
>> > -[ RECORD 129 ]----+--------------------
>> > locktype | relation
>> > database | 16385
>> > relation | 1249
>> > page |
>> > tuple |
>> > virtualxid |
>> > transactionid |
>> > classid |
>> > objid |
>> > objsubid |
>> > virtualtransaction | 1/0
>> > pid | 19018
>> > mode | AccessExclusiveLock
>> > granted | t
>> > fastpath | f
>>
>> I'm not sure why it didn't happen on 9.3,
>
>
> In 9.3, the AccessExclusiveLock is obtained by polling. If it can never
> get the lock, it won't stop other processes from getting an
> AccessShareLock, so it won't block their ability to log on. (Once
> max_standby_streaming_delay expires, it will then slaughter everything
> and take its lock.) In 9.6, it is obtained in the more conventional way,
> by waiting while blocking newcomers who want a conflicting version. I
> don't understand exactly why this change leads to the reported behavior,
> but it is probably related.
>
>
> but it is very likely
>> to be the consequence of a standby conflict. You are setting
>> max_standby_streaming_delay to 1200 seconds so the lock will be
>> released after the same duration at maximum.
>>
>> The lock shown above almost certainly is a vacuum-truncation
>> lock. Usually the lock is released soon by a following
>> commit. However, if some other conflicting xlogs (for example
>> vaccum cleanup records from a concurrent vacuum) comes before the
>> commit the truncation lock won't be released until the conflict
>> is resolved. This would be what you are looking.
>>
>
> Since no new sessions can connect once the pg_attributes table is locked,
> this would either be a self-deadlock (which I think is a bug) or it would
> require there to be long-lived connections to the standy-by which are
> holding the locks which are blocking the vacuum cleanup .
>
> Bruno, is this correct that you have long-lived transactions to the
> standby while this is occurring with pg_attribute? Does it correct after
> 20 minutes, if you are willing to wait that long?
>
> Cheers,
>
> Jeff
>
--
Bruno Richard | Software Architect
BroadSign International, LLC. | http://broadsign.com/
T: +1.514.399.1184 | F: +1.514.399.1187
From | Date | Subject | |
---|---|---|---|
Next Message | David G. Johnston | 2017-07-11 22:02:59 | Re: Deferrable constraint execution not respecting "initially immediate"? |
Previous Message | Tom Lane | 2017-07-11 19:39:14 | Re: BUG #14738: ALTER SERVER for foregin servers not working |