Re: Reindex "locked" standby database

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Martín Fernández <fmartin91(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Reindex "locked" standby database
Date: 2021-12-15 03:37:35
Message-ID: Ybli/z1eOBwmomgV@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Dec 15, 2021 at 12:15:27AM -0300, Martín Fernández wrote:
> The reindex went fine in the primary database and in one of our
> standby. The other standby that we also operate for some reason
> ended up in a state where all transactions were locked by the WAL
> process and the WAL process was not able to make any progress. In
> order to solve this issue we had to move traffic from the “bad”
> standby to the healthy one and then kill all transactions that were
> running in the “bad” standby. After that, replication was able to
> resume successfully.

You are referring to the startup process that replays WAL, right?
Without having an idea about the type of workload your primary and/or
standbys are facing, as well as an idea of the configuration you are
using on both (hot_standby_feedback for one), I have no direct idea,
but that could be a conflict caused by a concurrent vacuum.

Seeing where things got stuck could also be useful, perhaps with a
backtrace of the area where it happens and some information around
it.

> I’m just trying to understand what could have caused this issue. I
> was not able to identify any queries in the standby that would be
> locking the WAL process. Any insight would be more than welcome!

That's not going to be easy without more information, I am afraid.
--
Michael

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Mladen Gogala 2021-12-15 03:52:36 Re: Reindex "locked" standby database
Previous Message Martín Fernández 2021-12-15 03:15:27 Reindex "locked" standby database