Re: [EXTERNAL] Re: PostgreSQL-12 replication failover, pg_rewind fails

From: Mariya Rampurawala <Mariya(dot)Rampurawala(at)veritas(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "laurenz(dot)albe(at)cybertec(dot)at" <laurenz(dot)albe(at)cybertec(dot)at>
Cc: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: [EXTERNAL] Re: PostgreSQL-12 replication failover, pg_rewind fails
Date: 2020-05-13 04:58:15
Message-ID: A2A0EAE4-4953-40A8-89C2-C4576D28C92E@veritas.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thank you Kyotaro and Laurenz for your quick responses.
This helped me get my setup working.

Regards,
Mariya

On 13/05/20, 6:51 AM, "Kyotaro Horiguchi" <horikyota(dot)ntt(at)gmail(dot)com> wrote:

At Tue, 12 May 2020 17:29:50 +0200, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> wrote in
> On Tue, 2020-05-12 at 09:40 +0000, Mariya Rampurawala wrote:
> > > but if the target cluster ran for a long time after the divergence,
> > > the old WAL files might no longer be present. In that case, they can
> > > be manually copied from the WAL archive to the pg_wal directory, or
> > > fetched on startup by configuring primary_conninfo or restore_command.
> >
> > I hit this issue every time I follow the aforementioned steps, manually as well as with scripts.
> > How long is "long time after divergence"? Is there a way I can make some
> > configuration changes so that I donʼt hit this issue?
> > Is there anything I must change in my restore command?

As mentioned in the documentation, pg_rewind uses the WAL records
startng from the last checkpoint just before the divergence point. The
divergence point is shown as the follows in your log messages.

> pg_rewind: servers diverged at WAL location 6/B9FFFFD8 on timeline 53

pg_rewind scans backward starting from that location to find a
checkpoint record, which is the oldest WAL record pg_rewind needs. As
you see it is not a matter of calculation. There's no other way than
actually restoring WAL segment files to read and try finding.

> What you can do is to use a higher value for "wal_keep_segments".
> Then PostgreSQL will keep around that number of old WAL segments,
> which increases the chance for "pg_rewind" to succeed.

So this is one effective way to reduce the chance to lose required WAL
(segment) files.

On PG12, an easy way to automatically restore all required WAL files
would be restoring the WAL file every time pg_rewind complains that it
is missing.

Or, you could use pg_waldump to find a checkpoint record.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Matthias Apitz 2020-05-13 05:27:31 Re: ESQL/C: a ROLLBACK rolls back a COMMITED transaction
Previous Message Kyotaro Horiguchi 2020-05-13 01:21:30 Re: [EXTERNAL] Re: PostgreSQL-12 replication failover, pg_rewind fails