Re: pg_verifybackup: TAR format backup verification

From: Sravan Kumar <sravanvcybage(at)gmail(dot)com>
To: Amul Sul <sulamul(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_verifybackup: TAR format backup verification
Date: 2024-07-22 02:59:12
Message-ID: CA+=NbjjkFADANQoyvgTQfKH8ZerxpmsJD70R17V8ZtfaoyRYwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Amul,
thanks for working on this.

+ file_name_len = strlen(relpath);
> + if (file_name_len < file_extn_len ||
> + strcmp(relpath + file_name_len - file_extn_len, file_extn) != 0)
> + {
> + if (compress_algorithm == PG_COMPRESSION_NONE)
> + report_backup_error(context,
> + "\"%s\" is not a valid file, expecting tar file",
> + relpath);
> + else
> + report_backup_error(context,
> + "\"%s\" is not a valid file, expecting \"%s\" compressed tar file",
> + relpath,
> + get_compress_algorithm_name(compress_algorithm));
> + return;
> + }
>

I believe pg_verifybackup needs to exit after reporting a failure here
since it could not figure out a streamer to allocate.

Also, v1-0002 removes #include "pqexpbuffer.h" from astreamer.h and adds it
to the new .h file and in v1-0004 it
reverts the change. So this can be avoided altogether.

On Tue, Jul 9, 2024 at 3:24 PM Amul Sul <sulamul(at)gmail(dot)com> wrote:

> Hi,
>
> Currently, pg_verifybackup only works with plain (directory) format
> backups.
> This proposal aims to support tar-format backups too. We will read the tar
> files from start to finish and verify each file inside against the
> backup_manifest information, similar to how it verifies plain files.
>
> We are introducing new options to pg_verifybackup: -F, --format=p|t and -Z,
> --compress=METHOD, which allow users to specify the backup format and
> compression type, similar to the options available in pg_basebackup. If
> these
> options are not provided, the backup format and compression type will be
> automatically detected. To determine the format, we will search for
> PG_VERSION
> file in the backup directory — if found, it indicates a plain backup;
> otherwise, it
> is a tar-format backup. For the compression type, we will check the
> extension
> of base.tar.xxx file of tar-format backup. Refer to patch 0008 for the
> details.
>
> The main challenge is to structure the code neatly. For plain-format
> backups,
> we verify bytes directly from the files. For tar-format backups, we read
> bytes
> from the tar file of the specific file we care about. We need an
> abstraction
> to handle both formats smoothly, without using many if statements or
> special
> cases.
>
> To achieve this goal, we need to reuse existing infrastructure without
> duplicating code, and for that, the major work involved here is the code
> refactoring. Here is a breakdown of the work:
>
> 1. BBSTREAMER Rename and Relocate:
> BBSTREAMER, currently used in pg_basebackup for reading and decompressing
> TAR
> files; can also be used for pg_verifybackup. In the future, it could
> support
> other tools like pg_combinebackup for merging TAR backups without
> extraction,
> and pg_waldump for verifying WAL files from the tar backup. For that
> accessibility,
> BBSTREAMER needs to be relocated to a shared directory.
>
> Moreover, renaming BBSTREAMER to ASTREAMER (short for Archive Streamer)
> would
> better indicate its general application across multiple tools. Moving it to
> src/fe_utils directory is appropriate, given its frontend infrastructure
> use.
>
> 2. pg_verifybackup Code Refactoring:
> The existing code for plain backup verification will be split into separate
> files or functions, so it can also be reused for tar backup verification.
>
> 3. Adding TAR Backup Verification:
> Finally, patches will be added to implement TAR backup verification, along
> with
> tests and documentation.
>
> Patches 0001-0003 focus on renaming and relocating BBSTREAMER, patches
> 0004-0007 on splitting the existing verification code, and patches
> 0008-0010 on
> adding TAR backup verification capabilities, tests, and documentation. The
> last
> set could be a single patch but is split to make the review easier.
>
> Please take a look at the attached patches and share your comments,
> suggestions, or any ways to enhance them. Your feedback is greatly
> appreciated.
>
> Thank you !
>
> --
> Regards,
> Amul Sul
> EDB: http://www.enterprisedb.com
>

--
Thanks & Regards,
Sravan Velagandula
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-07-22 03:15:40 Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Previous Message Hayato Kuroda (Fujitsu) 2024-07-22 02:56:49 RE: Slow catchup of 2PC (twophase) transactions on replica in LR