Re: pg_verifybackup: TAR format backup verification

From: Amul Sul <sulamul(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Sravan Kumar <sravanvcybage(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: pg_verifybackup: TAR format backup verification
Date: 2024-08-12 09:12:24
Message-ID: CAAJ_b95mcGjkfAf1qduOR97CokW8-_i-dWLm3v6x1w2-OW9M+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 7, 2024 at 11:28 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Wed, Aug 7, 2024 at 1:05 PM Amul Sul <sulamul(at)gmail(dot)com> wrote:
> > The main issue I have is computing the total_size of valid files that
> > will be checksummed and that exist in both the manifests and the
> > backup, in the case of a tar backup. This cannot be done in the same
> > way as with a plain backup.
>
> I think you should compute and sum the sizes of the tar files
> themselves. Suppose you readdir(), make a list of files that look
> relevant, and stat() each one. total_size is the sum of the file
> sizes. Then you work your way through the list of files and read each
> one. done_size is the total size of all files you've read completely
> plus the number of bytes you've read from the current file so far.
>

I tried this in the attached version and made a few additional changes
based on Sravan's off-list comments regarding function names and
descriptions.

Now, verification happens in two passes. The first pass simply
verifies the file names, determines their compression types, and
returns a list of valid tar files whose contents need to be verified
in the second pass. The second pass is called at the end of
verify_backup_directory() after all files in that directory have been
scanned. I named the functions for pass 1 and pass 2 as
verify_tar_file_name() and verify_tar_file_contents(), respectively.
The rest of the code flow is similar as in the previous version.

In the attached patch set, I abandoned the changes, touching the
progress reporting code of plain backups by dropping the previous 0009
patch. The new 0009 patch adds missing APIs to simple_list.c to
destroy SimplePtrList. The rest of the patch numbers remain unchanged.

Regards,
Amul

Attachment Content-Type Size
v9-0011-pg_verifybackup-Read-tar-files-and-verify-its-con.patch application/x-patch 29.0 KB
v9-0008-Refactor-split-verify_control_file.patch application/x-patch 5.7 KB
v9-0012-pg_verifybackup-Tests-and-document.patch application/x-patch 12.5 KB
v9-0010-pg_verifybackup-Add-backup-format-and-compression.patch application/x-patch 6.2 KB
v9-0009-Add-simple_ptr_list_destroy-and-simple_ptr_list_d.patch application/x-patch 2.2 KB
v9-0005-Refactor-move-some-part-of-pg_verifybackup.c-to-p.patch application/x-patch 7.8 KB
v9-0007-Refactor-split-verify_file_checksum-function.patch application/x-patch 2.9 KB
v9-0004-Refactor-move-skip_checksums-global-variable-to-v.patch application/x-patch 1.9 KB
v9-0006-Refactor-split-verify_backup_file-function.patch application/x-patch 4.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melih Mutlu 2024-08-12 09:19:30 Re: Do we still need parent column in pg_backend_memory_context?
Previous Message Ashutosh Bapat 2024-08-12 09:10:58 Re: A problem about partitionwise join