postgres: WAL ends befor end of online backup

From: Moorthy RS <rsmoorthy(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: postgres: WAL ends befor end of online backup
Date: 2020-11-12 12:47:53
Message-ID: CAHLBwNk=6f+1JAyzNAGnc-WLqqkX7U9f7CDsEyq7NfQJoaPhaw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

We are running postgres 9.6, with 10+TB size. Backups have been taken using
a homegrown tool "pgrsync", which uses S3 as the repository. Both the
backup files and WAL archives are stored on S3.

Problem: While trying to restore, the WAL archives restore randomly fails
for some backups with the following message in logs:

2020-11-12 06:33:32 UTC [10037]: [27988-1] user=,db=LOG: redo done at
5493D/2EFFF568
2020-11-12 06:33:32 UTC [10037]: [27989-1] user=,db=LOG: last
completed transaction was at log time 2020-11-06 12:31:27.796805+00
2020-11-12 06:33:34 UTC [10037]: [27990-1] user=,db=LOG: restored log
file "000000020005493D0000002E" from archive
2020-11-12 06:33:34 UTC [10037]: [27991-1] user=,db=FATAL: WAL ends
before end of online backup
2020-11-12 06:33:34
UTC [10037]: [27992-1] user=,db=HINT: All WAL generated while online
backup was taken must be available at recovery.
2020-11-12 06:33:36 UTC [10033]: [3-1]
user=,db=LOG: startup process (PID 10037) exited with exit code 1

2020-11-12 06:33:36 UTC [10033]: [4-1] user=,db=LOG:
terminating any other active server processes

2020-11-12 06:33:48 UTC [10033]: [5-1] user=,db=LOG: database system
is shut down

In this case, the backup start location is 00000002000544C60000006B and
stop location is 00000002000545210000008D, (based on pg_stop_backup()
output) but it stops in between at 005493D and terminates the restore. If I
redo the restore again, it stops exactly at the same point. Similar results
from couple of more backups, while other backups successfully restores.

It is possibly an indication of some specific WAL files got corrupted
during the backup/restore process. Is that the correct interpretation?

Questions:

1. Are there any ways of identifying if the WAL file is corrupted?
2. Is there a way to move ahead without losing data? (I am wary of using
pg_resetxlog)

Browse pgsql-general by date

  From Date Subject
Next Message Mario Emmenlauer 2020-11-12 13:58:37 Is it possible to write a generic UPSERT?
Previous Message Matthias Apitz 2020-11-12 07:07:10 ECPG sqlca error handling