Re: pg_verifybackup: TAR format backup verification

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amul Sul <sulamul(at)gmail(dot)com>
Cc: Sravan Kumar <sravanvcybage(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: pg_verifybackup: TAR format backup verification
Date: 2024-07-31 20:07:03
Message-ID: CA+Tgmobmi7qiu8HDQECwuy-JrAH0zcYbzGnhnKjurqv44DHtxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 31, 2024 at 9:28 AM Amul Sul <sulamul(at)gmail(dot)com> wrote:
> Fixed -- I did that because it was part of a separate group in pg_basebackup.

Well, that's because pg_basebackup builds multiple executables, and
these files needed to be linked with some but not others. It looks
like when Andres added meson support, instead of linking each object
file into the binaries that need it, he had it just build a static
library and link every executable to that. That puts the linker in
charge of sorting out which binaries need which files, instead of
having the makefile do it. In any case, this consideration doesn't
apply when we're putting the object files into a library, so there was
no need to preserve the separate makefile variable. I think this looks
good now.

> Fixed -- frontend_common_code now includes lz4 as well.

Cool. 0003 overall looks good to me now, unless Andres wants to object.

> Noted. I might give it a try another day, unless someone else beats
> me, perhaps in a separate thread.

Probably not too important, since nobody has complained.

> Done -- added a new patch as 0004, and the subsequent patch numbers
> have been incremented accordingly.

I think I would have made this pass context->show_progress to
progress_report() instead of the whole verifier_context, but that's an
arguable stylistic choice, so I'll defer to you if you prefer it the
way you have it. Other than that, this LGTM.

However, what is now 0005 does something rather evil. The commit
message claims that it's just rearranging code, and that's almost
entirely true, except that it also changes manifest_file's pathname
member to be char * instead of const char *. I do not think that is a
good idea, and I definitely do not think you should do it in a patch
that purports to just be doing code movement, and I even more
definitely think that you should not do it without even mentioning
that you did it, and why you did it.

> Fixed -- I did the NULL check in the earlier 0007 patch, but it should
> have been done in this patch.

This is now 0006. struct stat's st_size is of type off_t -- or maybe
ssize_t on some platforms? - not type size_t. I suggest making the
filesize argument use int64 as we do in some other places. size_t is,
I believe, defined to be the right width to hold the size of an object
in memory, not the size of a file on disk, so it isn't really relevant
here.

Other than that, my only comment on this patch is that I think I would
find it more natural to write the check in verify_backup_file() in a
different order: I'd put context->manifest->version != 1 && m != NULL
&& m->matched && !m->bad && strcmp() because (a) that way the most
expensive test is last and (b) it feels weird to think about whether
we have the right pathname if we don't even have a valid manifest
entry. But this is minor and just a stylistic preference, so it's also
OK as you have it if you prefer.

> I agree, changing the order of errors could create confusion.
> Previously, a file size mismatch was a clear and appropriate error
> that was reported before the checksum failure error.

In my opinion, this patch (currently 0007) creates a rather confusing
situation that I can't easily reason about. Post-patch,
verify_content_checksum() is a mix of two different things: it ends up
containing all of the logic that needs to be performed on every chunk
of bytes read from the file plus some but not all of the end-of-file
error-checks from verify_file_checksum(). That's really weird. I'm not
very convinced that the test for whether we've reached the end of the
file is 100% correct, but even if it is, the stuff before that point
is stuff that is supposed to happen many times and the stuff after
that is only supposed to happen once, and I don't see any good reason
to smush those two categories of things into a single function. Plus,
changing the order in which those end-of-file checks happen doesn't
seem like the right idea either: the current ordering is good the way
it is. Maybe you want to think of refactoring to create TWO new
functions, one to do the per-hunk work and a second to do the
end-of-file "is the checksum OK?" stuff, or maybe you can just open
code it, but I'm not willing to commit this the way it is.

Regarding 0008, I don't really see a reason why the m != NULL
shouldn't also move inside should_verify_control_data(). Yeah, the
caller added in 0010 might not need the check, but it won't really
cost anything. Also, it seems to me that the logic in 0010 is actually
wrong. If m == NULL, we'll keep the values of verifyChecksums and
verifyControlData from the previous iteration, whereas what we should
do is make them both false. How about removing the if m == NULL guard
here and making both should_verify_checksum() and
should_verify_control_data() test m != NULL internally? Then it all
works out nicely, I think. Or alternatively you need an else clause
that resets both values to false when m == NULL.

> Okay, I added the verify_checksums() and verify_controldata()
> functions to the astreamer_verify.c file. I also updated related
> variables that were clashing with these function names:
> verify_checksums has been renamed to verifyChecksums, and verify_sysid
> has been renamed to verifyControlData.

Maybe think of doing something with the ASTREAMER_MEMBER_HEADER case also.

Out of time for today, will look again soon. I think the first few of
these are probably pretty much ready for commit already, and with a
little more adjustment they'll probably be ready up through about
0006.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-07-31 20:10:41 Re: can we mark upper/lower/textlike functions leakproof?
Previous Message Andrew Dunstan 2024-07-31 19:54:29 Re: can we mark upper/lower/textlike functions leakproof?