Re: pg_verifybackup: TAR format backup verification

From: Amul Sul <sulamul(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Sravan Kumar <sravanvcybage(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: pg_verifybackup: TAR format backup verification
Date: 2024-08-01 13:18:42
Message-ID: CAAJ_b94sxUN73kNZ_Z5sP0GDEZGAxKgW0PvS5CBJXiH212+Fww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 1, 2024 at 1:37 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Wed, Jul 31, 2024 at 9:28 AM Amul Sul <sulamul(at)gmail(dot)com> wrote:
> > Fixed -- I did that because it was part of a separate group in pg_basebackup.
>
> Well, that's because pg_basebackup builds multiple executables, and
> these files needed to be linked with some but not others. It looks
> like when Andres added meson support, instead of linking each object
> file into the binaries that need it, he had it just build a static
> library and link every executable to that. That puts the linker in
> charge of sorting out which binaries need which files, instead of
> having the makefile do it. In any case, this consideration doesn't
> apply when we're putting the object files into a library, so there was
> no need to preserve the separate makefile variable. I think this looks
> good now.
>

Understood.

> > Fixed -- frontend_common_code now includes lz4 as well.
>
> Cool. 0003 overall looks good to me now, unless Andres wants to object.
>
> > Noted. I might give it a try another day, unless someone else beats
> > me, perhaps in a separate thread.
>
> Probably not too important, since nobody has complained.
>
> > Done -- added a new patch as 0004, and the subsequent patch numbers
> > have been incremented accordingly.
>
> I think I would have made this pass context->show_progress to
> progress_report() instead of the whole verifier_context, but that's an
> arguable stylistic choice, so I'll defer to you if you prefer it the
> way you have it. Other than that, this LGTM.
>

Additionally, I moved total_size and done_size to verifier_context
because done_size needs to be accessed in astreamer_verify.c.
With this change, verifier_context is now more suitable.

> However, what is now 0005 does something rather evil. The commit
> message claims that it's just rearranging code, and that's almost
> entirely true, except that it also changes manifest_file's pathname
> member to be char * instead of const char *. I do not think that is a
> good idea, and I definitely do not think you should do it in a patch
> that purports to just be doing code movement, and I even more
> definitely think that you should not do it without even mentioning
> that you did it, and why you did it.
>

True, that was a mistake on my part during the rebase. Fixed in the
attached version.

> > Fixed -- I did the NULL check in the earlier 0007 patch, but it should
> > have been done in this patch.
>
> This is now 0006. struct stat's st_size is of type off_t -- or maybe
> ssize_t on some platforms? - not type size_t. I suggest making the
> filesize argument use int64 as we do in some other places. size_t is,
> I believe, defined to be the right width to hold the size of an object
> in memory, not the size of a file on disk, so it isn't really relevant
> here.
>

Ok, used int64.

> Other than that, my only comment on this patch is that I think I would
> find it more natural to write the check in verify_backup_file() in a
> different order: I'd put context->manifest->version != 1 && m != NULL
> && m->matched && !m->bad && strcmp() because (a) that way the most
> expensive test is last and (b) it feels weird to think about whether
> we have the right pathname if we don't even have a valid manifest
> entry. But this is minor and just a stylistic preference, so it's also
> OK as you have it if you prefer.
>

I used to do it that way (a) -- keeping the expensive check for last.
I did the same thing while adding should_verify_control_data() in the
later patch. Somehow, I missed it here, maybe I didn't pay enough
attention to this patch :(

> > I agree, changing the order of errors could create confusion.
> > Previously, a file size mismatch was a clear and appropriate error
> > that was reported before the checksum failure error.
>
> In my opinion, this patch (currently 0007) creates a rather confusing
> situation that I can't easily reason about. Post-patch,
> verify_content_checksum() is a mix of two different things: it ends up
> containing all of the logic that needs to be performed on every chunk
> of bytes read from the file plus some but not all of the end-of-file
> error-checks from verify_file_checksum(). That's really weird. I'm not
> very convinced that the test for whether we've reached the end of the
> file is 100% correct, but even if it is, the stuff before that point
> is stuff that is supposed to happen many times and the stuff after
> that is only supposed to happen once, and I don't see any good reason
> to smush those two categories of things into a single function. Plus,
> changing the order in which those end-of-file checks happen doesn't
> seem like the right idea either: the current ordering is good the way
> it is. Maybe you want to think of refactoring to create TWO new
> functions, one to do the per-hunk work and a second to do the
> end-of-file "is the checksum OK?" stuff, or maybe you can just open
> code it, but I'm not willing to commit this the way it is.
>

Understood. At the start of working on the v3 review, I thought of
completely discarding the 0007 patch and copying most of
verify_file_checksum() to a new function in astreamer_verify.c.
However, I later realized we could deduplicate some parts, so I split
verify_file_checksum() and moved the reusable part to a separate
function. Please have a look at v4-0007.

> Regarding 0008, I don't really see a reason why the m != NULL
> shouldn't also move inside should_verify_control_data(). Yeah, the
> caller added in 0010 might not need the check, but it won't really
> cost anything. Also, it seems to me that the logic in 0010 is actually
> wrong. If m == NULL, we'll keep the values of verifyChecksums and
> verifyControlData from the previous iteration, whereas what we should
> do is make them both false. How about removing the if m == NULL guard
> here and making both should_verify_checksum() and
> should_verify_control_data() test m != NULL internally? Then it all
> works out nicely, I think. Or alternatively you need an else clause
> that resets both values to false when m == NULL.
>

I had the same thought about checking for NULL inside
should_verify_control_data(), but I wanted to maintain the structure
similar to should_verify_checksum(). Making this change would have
also required altering should_verify_checksum(), I wasn’t sure if I
should make that change before. Now, I did that in the attached
version -- 0008 patch.

> > Okay, I added the verify_checksums() and verify_controldata()
> > functions to the astreamer_verify.c file. I also updated related
> > variables that were clashing with these function names:
> > verify_checksums has been renamed to verifyChecksums, and verify_sysid
> > has been renamed to verifyControlData.
>
> Maybe think of doing something with the ASTREAMER_MEMBER_HEADER case also.
>

Done.

> Out of time for today, will look again soon. I think the first few of
> these are probably pretty much ready for commit already, and with a
> little more adjustment they'll probably be ready up through about
> 0006.
>

Sure, thank you.

Regards,
Amul

Attachment Content-Type Size
v4-0008-Refactor-split-verify_control_file.patch application/x-patch 5.4 KB
v4-0007-Refactor-split-verify_file_checksum-function.patch application/x-patch 2.4 KB
v4-0009-pg_verifybackup-Add-backup-format-and-compression.patch application/x-patch 8.8 KB
v4-0010-pg_verifybackup-Read-tar-files-and-verify-its-con.patch application/x-patch 23.4 KB
v4-0011-pg_verifybackup-Tests-and-document.patch application/x-patch 13.7 KB
v4-0006-Refactor-split-verify_backup_file-function.patch application/x-patch 3.9 KB
v4-0005-Refactor-move-some-part-of-pg_verifybackup.c-to-p.patch application/x-patch 8.5 KB
v4-0003-Refactor-move-astreamer-files-to-fe_utils-to-make.patch application/x-patch 7.1 KB
v4-0004-Refactor-move-few-global-variable-to-verifier_con.patch application/x-patch 5.1 KB
v4-0002-Refactor-Add-astreamer_inject.h-and-move-related-.patch application/x-patch 3.0 KB
v4-0001-Refactor-Rename-all-bbstreamer-references-to-astr.patch application/x-patch 106.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Oleg Tselebrovskiy 2024-08-01 13:37:12 Re: Why is citext/regress failing on hamerkop?
Previous Message Junwang Zhao 2024-08-01 12:32:11 Re: [Patch] remove duplicated smgrclose