May data be corrupted after an interrupted, but afterwards sucessfully replayed recovery?

From: "Dietrich, Benjamin" <b(dot)dietrich(at)uni-tuebingen(dot)de>
To: "pgsql-admin(at)lists(dot)postgresql(dot)org" <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: May data be corrupted after an interrupted, but afterwards sucessfully replayed recovery?
Date: 2025-02-20 15:00:35
Message-ID: 236a0aa64b1c41dd8e305e64ca6d38b5@uni-tuebingen.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi experts,

we have a cluster that crashed because of a full 'pg_wal' disk.
When it automatically tried to restart after failure, it went into recovery,
crashed again - this time in recovery - and stopped (see logs below [1]).

The original problem is solved and the cluster now successfully started an
finished recovery.
But when I restarted the cluster it HINTed:

2025-02-20 09:26:57.615 CET [3982486] LOG: database system was interrupted
while in recovery at 2025-02-20 03:38:24 CET
2025-02-20 09:26:57.615 CET [3982486] HINT: This probably means that some
data is corrupted and you will have to use the last backup for recovery.

My question is now, if there is still a chance that the data is corrupted,
although the interrupted recovery was successfully redone later?
Should I recover the whole cluster from the last PITR backup to be safe? Or
can I be sure that the data is in a solid state, since recovery succeeded on
second try?

Regards,
Benjamin

----------------------
[1] logs:

2025-02-20 03:38:18.278 CET [3767151] PANIC: could not write to file
"pg_wal/xlogtemp.3767151": No space left on device
2025-02-20 03:38:18.315 CET [1909] LOG: server process (PID 3767151) was
terminated by signal 6: Aborted
2025-02-20 03:38:18.315 CET [1909] LOG: terminating any other active server
processes
2025-02-20 03:38:21.938 CET [1909] LOG: all server processes terminated;
reinitializing
2025-02-20 03:38:23.867 CET [3863518] LOG: database system was interrupted;
last known up at 2025-02-20 03:34:49 CET
2025-02-20 03:38:24.089 CET [3863518] LOG: database system was not properly
shut down; automatic recovery in progress
2025-02-20 03:38:24.102 CET [3863518] LOG: redo starts at E8A/EB648078
2025-02-20 03:38:24.119 CET [3863518] LOG: redo done at E8A/EC19B6B0 system
usage: CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.02 s
2025-02-20 03:38:24.136 CET [3863518] FATAL: could not write to file
"pg_wal/xlogtemp.3863518": No space left on device
2025-02-20 03:38:24.139 CET [1909] LOG: startup process (PID 3863518)
exited with exit code 1
2025-02-20 03:38:24.139 CET [1909] LOG: terminating any other active server
processes
2025-02-20 03:38:24.140 CET [1909] LOG: shutting down due to startup
process failure
2025-02-20 03:38:24.213 CET [1909] LOG: database system is shut down
2025-02-20 09:26:57.597 CET [3982483] LOG: starting PostgreSQL 15.10
(Debian 15.10-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian
12.2.0-14) 12.2.0, 64-bit
2025-02-20 09:26:57.597 CET [3982483] LOG: listening on IPv4 address
"0.0.0.0", port 5432
2025-02-20 09:26:57.597 CET [3982483] LOG: listening on IPv6 address "::",
port 5432
2025-02-20 09:26:57.598 CET [3982483] LOG: listening on Unix socket
"/var/run/postgresql/.s.PGSQL.5432"
2025-02-20 09:26:57.615 CET [3982486] LOG: database system was interrupted
while in recovery at 2025-02-20 03:38:24 CET
2025-02-20 09:26:57.615 CET [3982486] HINT: This probably means that some
data is corrupted and you will have to use the last backup for recovery.
2025-02-20 09:26:57.886 CET [3982486] LOG: database system was not properly
shut down; automatic recovery in progress
2025-02-20 09:26:57.890 CET [3982486] LOG: redo starts at E8A/EB648078
2025-02-20 09:26:57.932 CET [3982486] LOG: redo done at E8A/EC19B6B0 system
usage: CPU: user: 0.00 s, system: 0.01 s, elapsed: 0.04 s
2025-02-20 09:26:57.966 CET [3982484] LOG: checkpoint starting:
end-of-recovery immediate wait
2025-02-20 09:26:58.061 CET [3982484] LOG: checkpoint complete: wrote 390
buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.058 s,
sync=0.001 s, total=0.096 s; sync files=103, longest=0.001 s, average=0.001
s; distance=26335 kB, estimate=26335 kB
2025-02-20 09:26:58.068 CET [3982483] LOG: database system is ready to
accept connections

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Olleg Samoylov 2025-02-20 19:23:18 A trigger in an extension
Previous Message richard 2025-02-20 07:55:31 Re: In-place upgrade with streaming replicas