Re: [EXTERNAL] Re: PostgreSQL-12 replication failover, pg_rewind fails

From: Mariya Rampurawala <Mariya(dot)Rampurawala(at)veritas(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: [EXTERNAL] Re: PostgreSQL-12 replication failover, pg_rewind fails
Date: 2020-05-12 09:40:18
Message-ID: 8BD51BB9-8695-4F10-8E9A-144D3F97059C@veritas.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

Thank you for the response.

> but if the target cluster ran for a long time after the divergence,
> the old WAL files might no longer be present. In that case, they can
> be manually copied from the WAL archive to the pg_wal directory, or
> fetched on startup by configuring primary_conninfo or restore_command.

I hit this issue every time I follow the aforementioned steps, manually as well as with scripts.
How long is "long time after divergence"? Is there a way I can make some configuration changes so that I don’t hit this issue?
Is there anything I must change in my restore command?

===================================
primary_conninfo = 'user=replicator host=10.209.57.16 port=5432 sslmode=prefer sslcompression=0 gssencmode=prefer krbsrvname=postgres target_session_attrs=any'
restore_command = 'scp root(at)10(dot)209(dot)56(dot)88:/pg_backup/%f %p'
===================================

Regards,
Mariya

On 12/05/20, 2:15 PM, "Kyotaro Horiguchi" <horikyota(dot)ntt(at)gmail(dot)com> wrote:

Hello.

At Tue, 12 May 2020 06:32:30 +0000, Mariya Rampurawala <Mariya(dot)Rampurawala(at)veritas(dot)com> wrote in
> I am working on providing HA for replication, using automation scripts.
> My set up consists on two nodes, Master and Slave. When master fails, The slave is promoted to master. But when I try to re-register the old master as slave, the pg_rewind command fails. Details below.
...
> 1. Rewind again:
> 2. -bash-4.2$ /usr/pgsql-12/bin/pg_rewind -D /pg_mnt/pg-12/data --source-server="host=10.209.57.17 port=5432 user=postgres dbname=postgres"
>
> pg_rewind: servers diverged at WAL location 6/B9FFFFD8 on timeline 53
>
> pg_rewind: error: could not open file "/pg_mnt/pg-12/data/pg_wal/0000003500000006000000B9": No such file or directory
>
> pg_rewind: fatal: could not find previous WAL record at 6/B9FFFFD8
>
>
> I have tried this multiple times but always face the same error. Can someone help me resolve this?

As the error message is saying, required WAL file has been removed on
the old master. It is the normal behavior and described in the
documentation.

https://www.postgresql.org/docs/12/app-pgrewind.html

> but if the target cluster ran for a long time after the divergence,
> the old WAL files might no longer be present. In that case, they can
> be manually copied from the WAL archive to the pg_wal directory, or
> fetched on startup by configuring primary_conninfo or restore_command.

So you seem to need to restore the required WAL files from archive or
the current master.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Kouber Saparev 2020-05-12 12:27:49 pg_upgrade too slow on vacuum phase
Previous Message Kyotaro Horiguchi 2020-05-12 08:45:07 Re: PostgreSQL-12 replication failover, pg_rewind fails