Re: Very URGENT REQUEST - Postgresql error : PANIC: could not locate a valid checkpoint record

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: "Silaparasetti, Ramesh" <Ramesh(dot)Silaparasetty(at)dell(dot)com>
Cc: "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, "Kishore, Nanda - Dell Team" <Nanda(dot)Kishore(at)dellteam(dot)com>, "Mahendrakar, Prabhakar - Dell Team" <Prabhakar(dot)Mahendraka(at)dellteam(dot)com>, "Agarwal, Pragati - Dell Team" <Pragati(dot)A(at)dellteam(dot)com>
Subject: Re: Very URGENT REQUEST - Postgresql error : PANIC: could not locate a valid checkpoint record
Date: 2022-02-10 14:44:50
Message-ID: 20220210144450.o5nd6gcdt7qlittt@jrouhaud
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Feb 10, 2022 at 01:13:27PM +0000, Silaparasetti, Ramesh wrote:
>
> 1. Below is the output of the command : "<DPA_INSTALL_DIRECTORY>\services\datastore\engine\bin\pg_controldata.exe -D "<DPA_INSTALL_DIRECTORY>\services\datastore\data""
> C:\Program Files\EMC\DPA\services\datastore\engine\bin>pg_controldata.exe -D "F:\datastore\data\data"
> pg_control-Versionsnummer: 1300
> Katalogversionsnummer: 202007201
> Datenbanksystemidentifikation: 7054941472659574120
> Datenbank-Cluster-Status: heruntergefahren
> pg_control zuletzt geändert: 07.02.2022 14:57:30
> Position des letzten Checkpoints: 9/C80000D8
> REDO-Position des letzten Checkpoints: 9/C80000D8
> REDO-WAL-Datei des letzten Checkpoints: 0000000100000009000000C8
> [...]
> 2. As you suggested, we verified the value of Latest checkpoint's REDO WAL file: 0000000100000009000000C8.
>
> This WAL file does not exist at the pg_wal directory.
> We have enabled debug logging and below is the logging information from Postgres.
>
> 2022-02-10 11:38:05.675 CET [7916] LOG: starting PostgreSQL 13.1, compiled by Visual C++ build 1900, 64-bit
> 2022-02-10 11:38:05.679 CET [7916] LOG: listening on IPv4 address "127.0.0.1", port 9003
> 2022-02-10 11:38:05.681 CET [7916] LOG: listening on IPv4 address "10.91.198.36", port 9003
> 2022-02-10 11:38:06.756 CET [348] LOG: database system was shut down at 2022-02-07 14:57:30 CET
> 2022-02-10 11:38:06.756 CET [348] DEBUG: mapped win32 error code 2 to 2
> 2022-02-10 11:38:06.757 CET [348] DEBUG: mapped win32 error code 2 to 2
> 2022-02-10 11:38:06.757 CET [348] DEBUG: could not open file "pg_wal/0000000100000009000000C8": No such file or directory

So, unless you can find that 0000000100000009000000C8 file (and all the files
after that), your instance is corrupted and you lost data. If you have WAL
archiving or streaming to another location you should be able to recover from
that, assuming that no other files are damaged. Otherwise your best shot is
restoring from a backup.

> 4. Is it ok to execute "pg_resetwal" to recover from this situation? Does it pose any data loss ?

pg_resetwal will make the situation worse. The server will start but in a
totally inconsistent state. This should be your last choice, and understand
that it will irremediably corrupt your system even more.

At that point you should probably consider hiring some company with postgres
expertise to:

- try to recover some data if possible (it depends on what was the problem,
what other wals you have and the state of the rest of the files)
- understand what happened
- fix the root problem
- help you setup monitoring, alerting, archiving, backup, HA and other things
you might need

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2022-02-10 15:23:45 BUG #17402: RPM packages are not signed
Previous Message Daniel Gustafsson 2022-02-10 13:36:50 Re: BUG #17391: While using --with-ssl=openssl and PG_TEST_EXTRA='ssl' options, SSL tests fail on OpenBSD 7.0