RE: Postgresql error : PANIC: could not locate a valid checkpoint record

From: "Mahendrakar, Prabhakar - Dell Team" <Prabhakar(dot)Mahendraka(at)dellteam(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Mateusz Henicz <mateuszhenicz(at)gmail(dot)com>, "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: RE: Postgresql error : PANIC: could not locate a valid checkpoint record
Date: 2022-07-05 12:56:48
Message-ID: BYAPR19MB28862239079318727E55421994819@BYAPR19MB2886.namprd19.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Team,
Hi Team,

We are using the below command to perform the PG Upgrade.

'/opt/XXXX/YYYY/services/datastore/engine-new/bin/pg_upgrade'
-b '/opt/XXXX/YYYY/services/datastore/engine/bin'
-B '/opt/XXXX/YYYY/services/datastore/engine-new/bin'
-d '/opt/XXXX/YYYY/db/data'
-D '/opt/XXXX/YYYY/db/data-new'
-p 9003
-P 9003
-U apollosuperuser
-k -j 4 -v"

This is not a cluster environment and is standalone one.
Could you please let us know how to restore the database to a state before proceeding to upgrade or explicitly issue a checkpoint before we move on to the pg_upgrade command.

Thanks,
Prabhakar

Internal Use - Confidential

-----Original Message-----
From: Michael Paquier <michael(at)paquier(dot)xyz>
Sent: Monday, June 27, 2022 5:34 AM
To: Mahendrakar, Prabhakar - Dell Team
Cc: Mateusz Henicz; pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Postgresql error : PANIC: could not locate a valid checkpoint record

On Fri, Jun 24, 2022 at 01:03:57PM +0000, Mahendrakar, Prabhakar - Dell Team wrote:
> Is it possible to explicitly issue a checkpoint before we move on to
> the pg_upgrade command?
> so that in the circumstances of the Upgrade issues (like PANIC:
> could not locate a valid checkpoint record), we still have this last
> explicit checkpoint available.
>
> Please let us know your thoughts on this.

Well, you have mentioned the use of pg_upgrade, but you are giving zero details about what kind of command you used, how you handled the clusters before and after that were upgraded, or what kind of environment is getting used. With this little amount of details, nobody will be able to guess what's happening. This issue could also be caused by the environment. For example, it is possible in some carelessly-setup enviromnents that a flush is issued and recognized as completed by the OS, and thought as completed by Postgres, but an application layer between the OS and the actual hardware did not issue the flush (be it an OS, FS, disk or a VM-related thing), which would make this issue reachable.
--
Michael

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Laurenz Albe 2022-07-05 14:17:41 Re: lifetime of the old CTID
Previous Message Matthias Apitz 2022-07-05 10:22:01 Re: lifetime of the old CTID