From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Martín Fernández <fmartin91(at)gmail(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Reindex "locked" standby database |
Date: | 2021-12-15 03:37:35 |
Message-ID: | Ybli/z1eOBwmomgV@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Wed, Dec 15, 2021 at 12:15:27AM -0300, Martín Fernández wrote:
> The reindex went fine in the primary database and in one of our
> standby. The other standby that we also operate for some reason
> ended up in a state where all transactions were locked by the WAL
> process and the WAL process was not able to make any progress. In
> order to solve this issue we had to move traffic from the “bad”
> standby to the healthy one and then kill all transactions that were
> running in the “bad” standby. After that, replication was able to
> resume successfully.
You are referring to the startup process that replays WAL, right?
Without having an idea about the type of workload your primary and/or
standbys are facing, as well as an idea of the configuration you are
using on both (hot_standby_feedback for one), I have no direct idea,
but that could be a conflict caused by a concurrent vacuum.
Seeing where things got stuck could also be useful, perhaps with a
backtrace of the area where it happens and some information around
it.
> I’m just trying to understand what could have caused this issue. I
> was not able to identify any queries in the standby that would be
> locking the WAL process. Any insight would be more than welcome!
That's not going to be easy without more information, I am afraid.
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | Mladen Gogala | 2021-12-15 03:52:36 | Re: Reindex "locked" standby database |
Previous Message | Martín Fernández | 2021-12-15 03:15:27 | Reindex "locked" standby database |