pgsql: Generate backup manifests for base backups, and validate them.

From: Robert Haas <rhaas(at)postgresql(dot)org>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Generate backup manifests for base backups, and validate them.
Date: 2020-04-03 19:07:08
Message-ID: E1jKRem-0004xp-8Y@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

Generate backup manifests for base backups, and validate them.

A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.

A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.

Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.

Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/0d8c9c1210c44b36ec2efcb223a1dfbe897a3661

Modified Files
--------------
doc/src/sgml/protocol.sgml | 37 +-
doc/src/sgml/ref/allfiles.sgml | 1 +
doc/src/sgml/ref/pg_basebackup.sgml | 64 ++
doc/src/sgml/ref/pg_validatebackup.sgml | 291 ++++++++
doc/src/sgml/reference.sgml | 1 +
src/backend/access/transam/xlog.c | 3 +-
src/backend/replication/basebackup.c | 537 +++++++++++++-
src/backend/replication/repl_gram.y | 13 +
src/backend/replication/repl_scanner.l | 2 +
src/backend/replication/walsender.c | 30 +
src/bin/Makefile | 1 +
src/bin/pg_basebackup/pg_basebackup.c | 208 +++++-
src/bin/pg_basebackup/t/010_pg_basebackup.pl | 8 +-
src/bin/pg_validatebackup/.gitignore | 2 +
src/bin/pg_validatebackup/Makefile | 39 +
src/bin/pg_validatebackup/parse_manifest.c | 740 +++++++++++++++++++
src/bin/pg_validatebackup/parse_manifest.h | 45 ++
src/bin/pg_validatebackup/pg_validatebackup.c | 905 ++++++++++++++++++++++++
src/bin/pg_validatebackup/t/001_basic.pl | 30 +
src/bin/pg_validatebackup/t/002_algorithm.pl | 58 ++
src/bin/pg_validatebackup/t/003_corruption.pl | 251 +++++++
src/bin/pg_validatebackup/t/004_options.pl | 89 +++
src/bin/pg_validatebackup/t/005_bad_manifest.pl | 201 ++++++
src/bin/pg_validatebackup/t/006_encoding.pl | 27 +
src/bin/pg_validatebackup/t/007_wal.pl | 55 ++
src/include/replication/basebackup.h | 7 +-
src/include/replication/walsender.h | 1 +
27 files changed, 3614 insertions(+), 32 deletions(-)

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Robert Haas 2020-04-03 19:32:40 pgsql: pg_validatebackup: Adjust TAP tests to placate perlcritic.
Previous Message Julien Rouhaud 2020-04-03 18:37:38 Re: pgsql: Include information on buffer usage during planning phase, in EX

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-04-03 19:22:23 Re: backup manifests
Previous Message Andres Freund 2020-04-03 19:03:09 Re: snapshot too old issues, first around wraparound and then more.