Re: DETAIL: pg_rewind: servers diverged at WAL location 0/9000000 on timeline 1

From: "Kumar, Devesh" <devesh(dot)kumar(at)cmegroup(dot)com>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: DETAIL: pg_rewind: servers diverged at WAL location 0/9000000 on timeline 1
Date: 2024-04-29 10:25:42
Message-ID: CACMEH=4UG9_VGefOiwizOqrmhrNaSipNwiAQKcvh-5if5BmQGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello Laurenz

Thanks for the response. I am putting the details as below:

Primary repmgr.conf Details
[image: image.png]

Secondary repmgr.conf Details

[image: image.png]

Failover steps:

We stopped the primary server pg service and repmgrd automatically did the
failover to standby and made standby as the new primary.

See the below status after failover

[image: image.png]

Failback steps;

1. We executed a checkpoint on the new primary( originally standby ).
2. We ran the below node rejoin command with --dry-run

repmgr node rejoin -f /opt/postgresql/15.6/bin/repmgr.conf -d
'host=10.29.97.241 port=5432 user=repmgr dbname=repmgr' --force-rewind
--config-files=postgresql.conf,postgresql.local.conf,pg_hba.conf -v
--dry-run ///try to check if original_primary is eligible to rejoin

NOTICE: rejoin target is node "d-dba-pg-rnh9" (ID: 2)
INFO: replication connection to the rejoin target node was successful
INFO: local and rejoin target system identifiers match
DETAIL: system identifier is 7360952088605465701
NOTICE: pg_rewind execution required for this node to attach to rejoin
target node 2
DETAIL: rejoin target server's timeline 2 forked off current database
system timeline 1 before current recovery point 0/9000028
INFO: prerequisites for using pg_rewind are met
INFO: file "postgresql.conf" would be copied to
"/tmp/repmgr-config-archive-d-dba-pg-0ptt/postgresql.conf"
WARNING: specified file "/pgresdata101/data/postgresql.local.conf" not
found, skipping
INFO: file "pg_hba.conf" would be copied to
"/tmp/repmgr-config-archive-d-dba-pg-0ptt/pg_hba.conf"
INFO: pg_rewind would now be executed
DETAIL: pg_rewind command is:
/opt/postgresql/pg/bin/pg_rewind -D '/pgresdata101/data'
--source-server='host=10.29.97.241 port=5432 user=repmgr dbname=repmgr
connect_timeout=2'
INFO: prerequisites for executing NODE REJOIN are met

3. executed node rejoin command

repmgr node rejoin -f /opt/postgresql/15.6/bin/repmgr.conf -d
'host=10.29.97.241 port=5432 user=repmgr dbname=repmgr' --force-rewind
--config-files=postgresql.conf,postgresql.local.conf,pg_hba.conf -v
NOTICE: using provided configuration file
"/opt/postgresql/15.6/bin/repmgr.conf"
DEBUG: server version number is: 150000
DEBUG: set_config():
SET synchronous_commit TO 'local'
DEBUG: get_primary_node_id():
SELECT node_id FROM repmgr.nodes WHERE type =
'primary' AND active IS TRUE
DEBUG: get_node_record():
SELECT n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo,
n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file,
'' AS upstream_node_name, NULL AS attached FROM repmgr.nodes n WHERE
n.node_id = 2
NOTICE: rejoin target is node "d-dba-pg-rnh9" (ID: 2)
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr
host=10.29.97.241 port=5432 fallback_application_name=repmgr
options=-csearch_path="
DEBUG: set_config():
SET synchronous_commit TO 'local'
DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery()
DEBUG: get_node_record():
SELECT n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo,
n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file,
'' AS upstream_node_name, NULL AS attached FROM repmgr.nodes n WHERE
n.node_id = 1
DEBUG: local timeline: 1; rejoin target timeline: 2
DEBUG: get_timeline_history():
TIMELINE_HISTORY 2
DEBUG: local tli: 1; local_xlogpos: 0/9000028; follow_target_history->tli:
1; follow_target_history->end: 0/9000000
NOTICE: pg_rewind execution required for this node to attach to rejoin
target node 2
DETAIL: rejoin target server's timeline 2 forked off current database
system timeline 1 before current recovery point 0/9000028
DEBUG: guc_set():
SELECT true FROM pg_catalog.pg_settings WHERE name = 'full_page_writes'
AND setting = 'off'
DEBUG: guc_set():
SELECT true FROM pg_catalog.pg_settings WHERE name = 'wal_log_hints' AND
setting = 'on'
INFO: prerequisites for using pg_rewind are met
DEBUG: using archive directory "/tmp/repmgr-config-archive-d-dba-pg-0ptt"
DEBUG: copying "postgresql.conf" to
"/tmp/repmgr-config-archive-d-dba-pg-0ptt/postgresql.conf"
WARNING: specified file "/pgresdata101/data/postgresql.local.conf" not
found, skipping
DEBUG: copying "pg_hba.conf" to
"/tmp/repmgr-config-archive-d-dba-pg-0ptt/pg_hba.conf"
INFO: 2 files copied to "/tmp/repmgr-config-archive-d-dba-pg-0ptt"
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is "/opt/postgresql/pg/bin/pg_rewind -D
'/pgresdata101/data' --source-server='host=10.29.97.241 port=5432
user=repmgr dbname=repmgr connect_timeout=2'"
DEBUG: executing:
/opt/postgresql/pg/bin/pg_rewind -D '/pgresdata101/data'
--source-server='host=10.29.97.241 port=5432 user=repmgr dbname=repmgr
connect_timeout=2' 2>/tmp/repmgr_command.wgVGPS
DEBUG: result of command was 1 (256)
DEBUG: local_command(): output returned was:
pg_rewind: servers diverged at WAL location 0/9000000 on timeline 1
pg_rewind: error: could not open file
"/pgresdata101/data/pg_wal/000000010000000000000008": No such file or
directory
pg_rewind: error: could not find previous WAL record at 0/802B668

ERROR: pg_rewind execution failed
DETAIL: pg_rewind: servers diverged at WAL location 0/9000000 on timeline 1
pg_rewind: error: could not open file
"/pgresdata101/data/pg_wal/000000010000000000000008": No such file or
directory
pg_rewind: error: could not find previous WAL record at 0/802B668

___________________________

*DEVESH KUMAR*

Database Admin I – India

M: +91 6366843695

devesh(dot)kumar(at)cmegroup(dot)com <firstname(dot)lastname(at)cmegroup(dot)com>

[image: CC24_EC010-Great-Place-to-Work-India-email-sign-260x100px_v2 (1)
(1).jpg]

Address: Tridib Building Block B 5th Floor

Bagmane Tech Park CV Raman Nagar,

Bengaluru, 560093, IN
www.cmegroup.com

On Mon, Apr 29, 2024 at 3:37 PM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
wrote:

> This email is from an external source. Do not click links or open
> attachments you do not trust. EXERCISE CAUTION.
>
> On Sat, 2024-04-27 at 00:36 +0530, Kumar, Devesh wrote:
> > Currently we are working on setting up replication and testing failover
> scenarios
> > and failback. During our testing, failover is getting successful. During
> Failback,
> > when we are reverting the original primary instance as the new standby,
> we are
> > getting pg_rewind errors. Kindly can someone check and let us know.
> >
> > pg_rewind: servers diverged at WAL location 0/9000000 on timeline 1
> > pg_rewind: error: could not open file
> "/pgresdata101/data/pg_wal/000000010000000000000008": No such file or
> directory
> > pg_rewind: error: could not find previous WAL record at 0/802B668
>
> You should show the exact commands used for failover and failback.
>
> Yours,
> Laurenz Albe
>

--

NOTICE: This message, and any attachments, are for the intended
recipient(s) only, may contain information that is privileged, confidential
and/or proprietary and subject to important terms and conditions available
at 
https://www.cmegroup.com/tools-information/communications/e-communication-disclaimer.html
<https://www.cmegroup.com/tools-information/communications/e-communication-disclaimer.html
If you are not the intended recipient, please delete this message. CME
Group and its subsidiaries reserve the right to monitor all email
communications that occur on CME Group information systems.

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Alexander Lakhin 2024-04-29 14:00:00 Re: BUG #17855: Uninitialised memory used when the name type value processed in binary mode of Memoize
Previous Message Shlok Kyal 2024-04-29 10:14:29 Re: BUG #18433: Logical replication timeout