Re: pg_rewind after promote

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Emond Papegaaij <emond(dot)papegaaij(at)gmail(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: pg_rewind after promote
Date: 2024-03-28 15:21:55
Message-ID: c552270a77ccf35bef172b5d031955981dd3613c.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, 2024-03-28 at 15:52 +0100, Emond Papegaaij wrote:

>  * we detach the primary database backend, forcing a failover
>  * pgpool selects a new primary database and promotes it
>  * the other 2 nodes (the old primary and the other standby) are rewound
> and streaming is resumed from the new primary
>  * the node that needed to be taken out of the cluster (the old primary)
> is shutdown and rebooted
>
> This works fine most of the time, but sometimes we see this message on one of the nodes:
> pg_rewind: source and target cluster are on the same timeline pg_rewind: no rewind required
> This message seems timing related, as the first node might report that,
> while the second reports something like:
> pg_rewind: servers diverged at WAL location 5/F28AB1A8 on timeline 21
> pg_rewind: rewinding from last common checkpoint at 5/F27FCA98 on timeline 21
> pg_rewind: Done!
>
> If we ignore the response from pg_rewind, streaming will break on the node that reported
> no rewind was required. On the new primary, we do observe the database moving from timeline
> 21 to 22, but it seems this takes some time to materialize to be observable by pg_rewind.
>
> 1. Is my observation about the starting of a new timeline correct?
> 2. If yes, is there anything we can do during to block promotion process until the new
> timeline has fully materialized, either by waiting or preferably forcing the new
> timeline to be started?

This must be the problem addressed by commit 009eeee746 [1].

You'd have to upgrade to PostgreSQL v16, which would be a good idea anyway, given
that you are running v12.

A temporary workaround could be to explicitly trigger a checkpoint right after
promotion.

Yours,
Laurenz Albe

[1]. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=009eeee746825090ec7194321a3db4b298d6571e

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Nick Renders 2024-03-28 15:47:44 Re: could not open file "global/pg_filenode.map": Operation not permitted
Previous Message Emond Papegaaij 2024-03-28 14:52:52 pg_rewind after promote