From: | "Dietrich, Benjamin" <b(dot)dietrich(at)uni-tuebingen(dot)de> |
---|---|
To: | "pgsql-admin(at)lists(dot)postgresql(dot)org" <pgsql-admin(at)lists(dot)postgresql(dot)org> |
Subject: | May data be corrupted after an interrupted, but afterwards sucessfully replayed recovery? |
Date: | 2025-02-20 15:00:35 |
Message-ID: | 236a0aa64b1c41dd8e305e64ca6d38b5@uni-tuebingen.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
Hi experts,
we have a cluster that crashed because of a full 'pg_wal' disk.
When it automatically tried to restart after failure, it went into recovery,
crashed again - this time in recovery - and stopped (see logs below [1]).
The original problem is solved and the cluster now successfully started an
finished recovery.
But when I restarted the cluster it HINTed:
2025-02-20 09:26:57.615 CET [3982486] LOG: database system was interrupted
while in recovery at 2025-02-20 03:38:24 CET
2025-02-20 09:26:57.615 CET [3982486] HINT: This probably means that some
data is corrupted and you will have to use the last backup for recovery.
My question is now, if there is still a chance that the data is corrupted,
although the interrupted recovery was successfully redone later?
Should I recover the whole cluster from the last PITR backup to be safe? Or
can I be sure that the data is in a solid state, since recovery succeeded on
second try?
Regards,
Benjamin
----------------------
[1] logs:
2025-02-20 03:38:18.278 CET [3767151] PANIC: could not write to file
"pg_wal/xlogtemp.3767151": No space left on device
2025-02-20 03:38:18.315 CET [1909] LOG: server process (PID 3767151) was
terminated by signal 6: Aborted
2025-02-20 03:38:18.315 CET [1909] LOG: terminating any other active server
processes
2025-02-20 03:38:21.938 CET [1909] LOG: all server processes terminated;
reinitializing
2025-02-20 03:38:23.867 CET [3863518] LOG: database system was interrupted;
last known up at 2025-02-20 03:34:49 CET
2025-02-20 03:38:24.089 CET [3863518] LOG: database system was not properly
shut down; automatic recovery in progress
2025-02-20 03:38:24.102 CET [3863518] LOG: redo starts at E8A/EB648078
2025-02-20 03:38:24.119 CET [3863518] LOG: redo done at E8A/EC19B6B0 system
usage: CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.02 s
2025-02-20 03:38:24.136 CET [3863518] FATAL: could not write to file
"pg_wal/xlogtemp.3863518": No space left on device
2025-02-20 03:38:24.139 CET [1909] LOG: startup process (PID 3863518)
exited with exit code 1
2025-02-20 03:38:24.139 CET [1909] LOG: terminating any other active server
processes
2025-02-20 03:38:24.140 CET [1909] LOG: shutting down due to startup
process failure
2025-02-20 03:38:24.213 CET [1909] LOG: database system is shut down
2025-02-20 09:26:57.597 CET [3982483] LOG: starting PostgreSQL 15.10
(Debian 15.10-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian
12.2.0-14) 12.2.0, 64-bit
2025-02-20 09:26:57.597 CET [3982483] LOG: listening on IPv4 address
"0.0.0.0", port 5432
2025-02-20 09:26:57.597 CET [3982483] LOG: listening on IPv6 address "::",
port 5432
2025-02-20 09:26:57.598 CET [3982483] LOG: listening on Unix socket
"/var/run/postgresql/.s.PGSQL.5432"
2025-02-20 09:26:57.615 CET [3982486] LOG: database system was interrupted
while in recovery at 2025-02-20 03:38:24 CET
2025-02-20 09:26:57.615 CET [3982486] HINT: This probably means that some
data is corrupted and you will have to use the last backup for recovery.
2025-02-20 09:26:57.886 CET [3982486] LOG: database system was not properly
shut down; automatic recovery in progress
2025-02-20 09:26:57.890 CET [3982486] LOG: redo starts at E8A/EB648078
2025-02-20 09:26:57.932 CET [3982486] LOG: redo done at E8A/EC19B6B0 system
usage: CPU: user: 0.00 s, system: 0.01 s, elapsed: 0.04 s
2025-02-20 09:26:57.966 CET [3982484] LOG: checkpoint starting:
end-of-recovery immediate wait
2025-02-20 09:26:58.061 CET [3982484] LOG: checkpoint complete: wrote 390
buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.058 s,
sync=0.001 s, total=0.096 s; sync files=103, longest=0.001 s, average=0.001
s; distance=26335 kB, estimate=26335 kB
2025-02-20 09:26:58.068 CET [3982483] LOG: database system is ready to
accept connections
From | Date | Subject | |
---|---|---|---|
Next Message | Olleg Samoylov | 2025-02-20 19:23:18 | A trigger in an extension |
Previous Message | richard | 2025-02-20 07:55:31 | Re: In-place upgrade with streaming replicas |