Re: Reindex "locked" standby database

From: Martín Fernández <fmartin91(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Reindex "locked" standby database
Date: 2021-12-15 04:10:45
Message-ID: 8A21C35C-EE30-445A-933A-354603824D54@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Micheal,

Thanks for much for the quick response.

> On 15 Dec 2021, at 00:37, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Wed, Dec 15, 2021 at 12:15:27AM -0300, Martín Fernández wrote:
>> The reindex went fine in the primary database and in one of our
>> standby. The other standby that we also operate for some reason
>> ended up in a state where all transactions were locked by the WAL
>> process and the WAL process was not able to make any progress. In
>> order to solve this issue we had to move traffic from the “bad”
>> standby to the healthy one and then kill all transactions that were
>> running in the “bad” standby. After that, replication was able to
>> resume successfully.
>
> You are referring to the startup process that replays WAL, right?
That is correct, I’m talking about the startup process that replays the WAL files.

> Without having an idea about the type of workload your primary and/or
> standbys are facing, as well as an idea of the configuration you are
> using on both (hot_standby_feedback for one), I have no direct idea,

Primary handle IOT data ingestion. The table that we had to REINDEX gets updated every time a new message arrives in the system so updated are happening very often on that table, thus, the index/table bloat. The standby at any point in time would be receiving queries that would take advantage of the indexes that were being re indexed. hot_standby_feedback is currently turned OFF on the standbys.

> but that could be a conflict caused by a concurrent vacuum.

>
> Seeing where things got stuck could also be useful, perhaps with a
> backtrace of the area where it happens and some information around
> it.
>
>> I’m just trying to understand what could have caused this issue. I
>> was not able to identify any queries in the standby that would be
>> locking the WAL process. Any insight would be more than welcome!
>
> That's not going to be easy without more information, I am afraid.
> --
> Michael

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Martín Fernández 2021-12-15 04:20:51 Re: Reindex "locked" standby database
Previous Message Mladen Gogala 2021-12-15 03:52:36 Re: Reindex "locked" standby database