PostgreSQL Timeline Issue After Switchover with Pacemaker

From: <di(dot)liu(at)japannext(dot)co(dot)jp>
To: <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: PostgreSQL Timeline Issue After Switchover with Pacemaker
Date: 2025-01-31 00:09:50
Message-ID: 05b001db7374$73a9ccb0$5afd6610$@japannext.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hello,

We have multiple two-node primary/standby PostgreSQL clusters managed by
Pacemaker.

Yesterday, we performed an OS upgrade following these steps:

1. Put the standby node into standby mode using:

pcs node standby [node_name]

2. The Unix team upgraded the OS.
3. Upgraded packages (pgBackRest, Pacemaker, pcs, Corosync).
4. Unstandby the node using:

pcs node unstandby [node_name]

5. Repeated the process for the primary node.

However, in one cluster, the PostgreSQL instance did not start when the
previous primary became the new standby.

Logs from the failed node:

2025-01-30 15:53:57.106 JST,,,12779,,679b2205.31eb,1,,2025-01-30 15:53:57
JST,,0,FATAL,55000,"highest timeline 7 of the primary is behind recovery
timeline 8",,,,,,,,,"","walreceiver",,0

2025-01-30 15:53:57.106 JST,,,9197,,679b212d.23ed,49,,2025-01-30 15:50:21
JST,1/0,0,LOG,00000,"waiting for WAL to become available at
88FE/890000B8",,,,,,,,,"","startup",,0

2025-01-30 15:54:02.109 JST,,,12782,,679b220a.31ee,1,,2025-01-30 15:54:02
JST,,0,FATAL,55000,"highest timeline 7 of the primary is behind recovery
timeline 8",,,,,,,,,"","walreceiver",,0

2025-01-30 15:54:02.109 JST,,,9197,,679b212d.23ed,50,,2025-01-30 15:50:21
JST,1/0,0,LOG,00000,"waiting for WAL to become available at
88FE/890000B8",,,,,,,,,"","startup",,0

Steps Taken to Fix It:

I took an incremental backup from the primary database using pgBackRest and
performed a delta restore:

pgbackrest --stanza=xxxx --delta --type=standby --log-level-console=detail
restore

However, after the restoration, PostgreSQL still wouldn't start, and I found
the following error logs:

2025-01-30 16:16:30.917 JST,,,33337,,679b274e.8239,1,,2025-01-30 16:16:30
JST,,0,LOG,00000,"database system was interrupted; last known up at
2025-01-30 16:11:27 JST",,,,,,,,,"","startup",,0

2025-01-30 16:16:31.087 JST,,,33337,,679b274e.8239,2,,2025-01-30 16:16:30
JST,,0,LOG,00000,"restored log file ""00000008.history"" from
archive",,,,,,,,,"","startup",,0

2025-01-30 16:16:31.094 JST,,,33337,,679b274e.8239,3,,2025-01-30 16:16:30
JST,,0,LOG,00000,"entering standby mode",,,,,,,,,"","startup",,0

2025-01-30 16:16:31.094 JST,,,33337,,679b274e.8239,4,,2025-01-30 16:16:30
JST,,0,LOG,00000,"starting backup recovery with redo LSN 88FE/8B000028,
checkpoint LSN 88FE/8B000098, on timeline ID 7",,,,,,,,,"","startup",,0

2025-01-30 16:16:31.102 JST,,,33337,,679b274e.8239,5,,2025-01-30 16:16:30
JST,,0,LOG,00000,"restored log file ""00000008.history"" from
archive",,,,,,,,,"","startup",,0

2025-01-30 16:16:31.146 JST,,,33337,,679b274e.8239,6,,2025-01-30 16:16:30
JST,,0,LOG,00000,"restored log file ""00000007000088FE0000008B"" from
archive",,,,,,,,,"","startup",,0

2025-01-30 16:16:31.155 JST,,,33337,,679b274e.8239,7,,2025-01-30 16:16:30
JST,,0,FATAL,XX000,"requested timeline 8 is not a child of this server's
history","Latest checkpoint is at 0/DEAD on timeline 7, but in the history
of the requested timeline, the server forked off from that timeline at
88FE/880000A0.",,,,,,,,"","startup",,0

After many attempts (including taking a full backup and performing a full
restore, but encountering the same error), I was able to bring up the node
using:

pgbackrest --stanza=xxxx --delta --type=standby --log-level-console=detail
restore --target-timeline=current

My Questions:

1. Why did this happen? Could you explain or provide keywords/links
that I can look into?
2. Why, after a full backup restore, did the standby still look for the
wrong timeline? Is there a file or setting that records the timeline
information?

Thank you,
Dean

--
_This correspondence (including any attachments) is for the intended
recipient(s) only. It may contain confidential or privileged information or
both. No confidentiality or privilege is waived or lost by any
mis-transmission. If you receive this correspondence by mistake, please
contact the sender immediately, delete this correspondence (and all
attachments) and destroy any hard copies. You must not use, disclose, copy,
distribute or rely on any part of this correspondence (including any
attachments) if you are not the intended
recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Somnath Som 2025-01-31 05:12:18 Backup Issue for pg_basebackup
Previous Message Holger Jakobs 2025-01-30 22:09:14 Re: postgresql support for diacritical character sets?