Re: pg_verifybackup: TAR format backup verification

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amul Sul <sulamul(at)gmail(dot)com>
Cc: Sravan Kumar <sravanvcybage(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_verifybackup: TAR format backup verification
Date: 2024-09-25 18:47:58
Message-ID: CA+Tgmob3zhdDM=ggw-NrV-jrvLP7a-otWFn5F+5oOputiFdhmQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 12, 2024 at 7:05 AM Amul Sul <sulamul(at)gmail(dot)com> wrote:
> The updated version attached. Thank you for the review !

I have spent a bunch of time on this and have made numerous revisions.
I hope to commit the result, aftering seeing what you and the
buildfarm think (and anyone else who wishes to offer an opinion).
Changes:

1. I adjusted some documentation wording for clarity.

2. I adjusted quite a few comments.

3. I changed the code to canonicalize pathnames taken from tar files,
so that a backup where tar file names begin with "./" doesn't break
backup verification.

4. I changed the code to use a dedicated buffer of type
ControlFileData instead of buffering the control file in bbs_buffer,
because there's no guarantee that bbs_buffer is sufficiently aligned,
which could result in failures on non-x86 platforms.

5. I changed the way that we validate the length of the control file;
the old code looked like it was checking that the file size was
sizeof(ControlFileData), but in fact the control file is much bigger
than that and its size is given by PG_CONTROL_FILE_SIZE. The old test
passed only because the computed file size was capped at
sizeof(ControlFileData), even though the actual file size was larger.

6. I fixed things so that we check that the target directory exists
before trying to figure out the backup format, so that cases where the
directory doesn't exist behave the same as before instead of failing
with a different error message.

7. I adjusted the test cases in view of point #3 and point #6.

8. I reverted various refactorings about which I earlier complained,
because they put very small amounts of code into functions which in my
opinion made the code harder to read. I also realized along the way
that (a) you hadn't updated the comments in those functions, or at
least not thoroughly, so they contained some text that was really only
applicable to the plain-format case and (b) some of the error message
really deserved to be different in the plain and tar format cases. In
particular, when there's a problem with an archive member, it seems
good to mention both the name of the archive and the name of the
archive member. Having separate code paths makes that easy and I've
done it in this version. Exception: I didn't update the messages for
failing to initialize the checksum context, because I don't think
those can happen and it doesn't really even make sense to include the
file name in the first place; any hypothetical failure would
presumably be based on which algorithm was picked, not which file you
were planning to use it on. This area could use some cleanup but it's
not this patch's job to make it less weird.

9. I rewrote 003_corruption.pl so that we apply the same tests for tar
and plain format backups without nearly as much code duplication as
you had.

10. I added a few test cases to 004_options.pl, so that we test the -F
option systematically, including what happens with an invalid value.

11. I moved the --format option to the correct place in alphabetical
order in the usage output.

I think that's everything that I changed, but I might have missed
something in putting this list together. Hopefully not.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v15-0001-pg_verifybackup-Verify-tar-format-backups.patch application/octet-stream 56.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-09-25 18:48:35 Re: pg_ctl/miscinit: print "MyStartTime" as a long long instead of long to avoid 2038 problem.
Previous Message Florents Tselai 2024-09-25 18:17:20 PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part