Re: pg_basebackup and pg_receivewal timing, missing WAL files

From: Tim <timfosho(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: posgres support <pgsql-admin(at)postgresql(dot)org>
Subject: Re: pg_basebackup and pg_receivewal timing, missing WAL files
Date: 2022-06-09 21:05:28
Message-ID: CAKhLO5gY4rUS65RUD_OGyHQTxoOa1-akLBWjh-CDQiVDJknecw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

We currently use pgbackrest as our primary backup tool actually, and it
works great! The issue I ran into with trying to set it up that way with
pgbackrest, is when configuring the second repo, pgbackrest will have to
archive all WAL files to it, going back to the oldest backup. And we have
this (admittedly arbitrary) legal policy requiring a single instance of a 6
month old backup for auditing purposes. So pgbackrest has to keep WALs that
far back, the main repo only needs 4 weeks of WALs for PITR, its a bit
overkill archiving WALs twice and also storing them an extra 5 months. So I
opted to write a custom backup script.

Though I was not aware of the `--archive-copy` option, I wonder if I can
just turn that option on for the main repo, and then extract the particular
backup I need to another location. We use Azure Blobs for backups.

Unfortunately `-Xs` cannot be used with pg_basebackup in TAR format, I'm
using the AzCopy utility, by piping pg_basebackup into it, to stream the
basebackup directly into Azure Blob storage, without storing it anywhere
first (Which pgbackrest does natively as well!). So there is no other way
without using the TAR format option.

Nevertheless, it seems like I need the contents of the pg_wal directory at
the start of pg_basebackup, since pg_receivewal only streams WAL generated
after it starts, the restored DB is missing all the files contained
within, I've adjusted my script to copy them as well and testing currently.

On Thu, Jun 9, 2022 at 2:36 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:

> Greetings,
>
> * Tim (timfosho(at)gmail(dot)com) wrote:
> > I'm attempting to create a long term, stand alone backup archive script
> for
> > a pretty active and large database (3TB) which will store backups going
> > back a few months.
> > Using the combination of pg_basebackup in TAR format, with the `Xn` flag,
> > and running pg_receivewal in the background, I'm ending up with a backup
> > that is missing quite a few WAL files.
> >
> > Trying to figure out where my understanding of the process is incorrect,
> > and what I need to do to correct this.
> >
> > *Backup *process:
> >
> > * Start pg_receivewal to archive location
> > * Then run: `pg_basebackup -h {host} -p {port} -D - -Ft -Xn
> > --checkpoint=fast | ...`
>
> What's wrong with using -Xs instead..? Basically does exactly what you
> seem to be trying to do here.
>
> > * This backup is taken from a *standby node*
>
> Where is the pg_receivewal run? Against the primary or the replica?
>
> > * Once pg_basebackup finishes, send SIGINT to pg_receivewal
> >
> > *Recovery* Process:
> >
> > * Untar basebackup to data directory
> > * Copy archived WAL files to a separate directory on the same drive
> > * settings: restore_command = 'cp /var/lib/pgsql/wal_archive/%f %p'
> > recovery_target = immediate
> >
> >
> > Once I start the DB, I immediately get
> >
> > cp: cannot stat '/var/lib/pgsql/wal_archive/0000001200003CF80000003F': No
> > such file or directory
> >
> >
> >
> > *WAL FILES: *
> >
> > 0000001200003CF80000003F << First WAL file the DB looks for
> > 0000001200003CF700000075 << The next WAL file
> > ---------------------------------------- << Missing WAL files from
> > pg_receivewal archive
> > ...
> > 0000001200003CF8000000C8 << First WAL file in pg_receivewal archive
> >
> > *backup_label *file:
> >
> > START WAL LOCATION: 3CF7/75974400 (file 0000001200003CF700000075)
> > CHECKPOINT LOCATION: 3CF8/3F1139C0
> > BACKUP METHOD: streamed
> > BACKUP FROM: standby
> > START TIME: 2022-06-08 12:51:52 EDT
> > LABEL: pg_basebackup base backup
> > START TIMELINE: 18
> >
> >
> > I can see that START WAL LOCATION & CHECKPOINT LOCATION are where the
> > basebackup starts to recover from, but how come these are not in my
> > pg_receivewal archive, if I'm starting it before starting the
> pg_basebackup
> > process? How can I ensure they are part of this archive?
>
> Well, using -Xs should do that.
>
> > Since this is a recent backup, I have a separate WAL archive from where I
> > can just recover those using a different restore_command, and once that
> > missing gap of WAL files is recovered from this archive, stopping the
> > server
> > and switching back to `restore_command = 'cp
> /var/lib/pgsql/wal_archive/%f
> > %p'` and recovering the rest of the WAL files from the pg_receivewal
> > archive works fine. The DB recovers and accepts connections shortly
> after.
> >
> > The whole point of this is to have a several months old, stand alone
> > backup, I need to have all the WAL files available in that original WAL
> > archive created with pg_receivewal.
>
> The use-case generally makes sense, in pgbackrest we have --archive-copy
> for more-or-less the same kind of thing, though you can now just use a
> separate newly created repo that you back up the standalone backup to
> (when multiple repos are configured, archive-push will archive to all of
> them) and then archive that copy. The latter is what we'd generally
> recommend these days, but with archive-copy and a full backup, you can
> just grab the full backup directory from inside the repo and it'll have
> everything.
>
> Thanks,
>
> Stephen
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Stephen Frost 2022-06-09 21:24:09 Re: pg_basebackup and pg_receivewal timing, missing WAL files
Previous Message Pascal CROZET 2022-06-09 19:52:53 ERROR: type "my_user_type" does not exist on REFRESH MATERIALIZED VIEW