pg_basebackup and pg_receivewal timing, missing WAL files

From: Tim <timfosho(at)gmail(dot)com>
To: posgres support <pgsql-admin(at)postgresql(dot)org>
Subject: pg_basebackup and pg_receivewal timing, missing WAL files
Date: 2022-06-09 15:31:56
Message-ID: CAKhLO5itE_xp7KX7Vxnho8Uzczgwt8iUD9bVtKNPamBOst=hPA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hello all,

I'm attempting to create a long term, stand alone backup archive script for
a pretty active and large database (3TB) which will store backups going
back a few months.
Using the combination of pg_basebackup in TAR format, with the `Xn` flag,
and running pg_receivewal in the background, I'm ending up with a backup
that is missing quite a few WAL files.

Trying to figure out where my understanding of the process is incorrect,
and what I need to do to correct this.

*Backup *process:

* Start pg_receivewal to archive location
* Then run: `pg_basebackup -h {host} -p {port} -D - -Ft -Xn
--checkpoint=fast | ...`

* This backup is taken from a *standby node*

* Once pg_basebackup finishes, send SIGINT to pg_receivewal

*Recovery* Process:

* Untar basebackup to data directory
* Copy archived WAL files to a separate directory on the same drive
* settings: restore_command = 'cp /var/lib/pgsql/wal_archive/%f %p'
recovery_target = immediate

Once I start the DB, I immediately get

cp: cannot stat '/var/lib/pgsql/wal_archive/0000001200003CF80000003F': No
such file or directory

*WAL FILES: *

0000001200003CF80000003F << First WAL file the DB looks for
0000001200003CF700000075 << The next WAL file
---------------------------------------- << Missing WAL files from
pg_receivewal archive
...
0000001200003CF8000000C8 << First WAL file in pg_receivewal archive

*backup_label *file:

START WAL LOCATION: 3CF7/75974400 (file 0000001200003CF700000075)
CHECKPOINT LOCATION: 3CF8/3F1139C0
BACKUP METHOD: streamed
BACKUP FROM: standby
START TIME: 2022-06-08 12:51:52 EDT
LABEL: pg_basebackup base backup
START TIMELINE: 18

I can see that START WAL LOCATION & CHECKPOINT LOCATION are where the
basebackup starts to recover from, but how come these are not in my
pg_receivewal archive, if I'm starting it before starting the pg_basebackup
process? How can I ensure they are part of this archive?

Since this is a recent backup, I have a separate WAL archive from where I
can just recover those using a different restore_command, and once that
missing gap of WAL files is recovered from this archive, stopping the
server
and switching back to `restore_command = 'cp /var/lib/pgsql/wal_archive/%f
%p'` and recovering the rest of the WAL files from the pg_receivewal
archive works fine. The DB recovers and accepts connections shortly after.

The whole point of this is to have a several months old, stand alone
backup, I need to have all the WAL files available in that original WAL
archive created with pg_receivewal.

Appreciate the help, thank you in advance!

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message David G. Johnston 2022-06-09 16:46:58 Re: pg_upgrade and missing loadable libraries
Previous Message Arun Maddukuri 2022-06-09 08:37:31 pg_upgrade and missing loadable libraries