Re: pg_verifybackup: TAR format backup verification

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amul Sul <sulamul(at)gmail(dot)com>
Cc: Sravan Kumar <sravanvcybage(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_verifybackup: TAR format backup verification
Date: 2024-09-09 20:00:49
Message-ID: CA+TgmoZ-fwQ0aWUs_40oXkEpF7JAn4D9SnKj8dj=qGLtO0VSRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I would rather that you didn't add simple_ptr_list_destroy_deep()
given that you don't need it for this patch series.

+
"\"%s\" unexpected file in the tar format backup",

This doesn't seem grammatical to me. Perhaps change this to: file
\"%s\" is not expected in a tar format backup

+ /* We are only interested in files that are not in the ignore list. */
+ if (member->is_directory || member->is_link ||
+ should_ignore_relpath(mystreamer->context, member->pathname))
+ return;

Doesn't this need to happen after we add pg_tblspc/$OID to the path,
rather than before? I bet this doesn't work correctly for files in
user-defined tablespaces, compared to the way it work for a
directory-format backup.

+ char temp[MAXPGPATH];
+
+ /* Copy original name at temporary space */
+ memcpy(temp, member->pathname, MAXPGPATH);
+
+ snprintf(member->pathname, MAXPGPATH, "%s/%d/%s",
+ "pg_tblspc", mystreamer->tblspc_oid, temp);

I don't like this at all. This function doesn't have any business
modifying the astreamer_member, and it doesn't need to. It can just do
char *pathname; char tmppathbuf[MAXPGPATH] and then set pathname to
either member->pathname or tmppathbuf depending on
OidIsValid(tblspcoid). Also, shouldn't this be using %u, not %d?

+ mystreamer->mfile = (void *) m;

Either the cast to void * isn't necessary, or it indicates that
there's a type mismatch that should be fixed.

+ * We could have these checks while receiving contents. However, since
+ * contents are received in multiple iterations, this would result in
+ * these lengthy checks being performed multiple times. Instead, setting
+ * flags here and using them before proceeding with verification will be
+ * more efficient.

Seems unnecessary to explain this.

+ Assert(mystreamer->verify_checksum);
+
+ /* Should have came for the right file */
+ Assert(strcmp(member->pathname, m->pathname) == 0);
+
+ /*
+ * The checksum context should match the type noted in the backup
+ * manifest.
+ */
+ Assert(checksum_ctx->type == m->checksum_type);

What do you think about:

Assert(m != NULL && !m->bad);
Assert(checksum_ctx->type == m->checksum_type);
Assert(strcmp(member->pathname, m->pathname) == 0);

Or possibly change the first one to Assert(should_verify_checksum(m))?

+ memcpy(&control_file, streamer->bbs_buffer.data,
sizeof(ControlFileData));

This probably doesn't really hurt anything, but it's a bit ugly. You
first use astreamer_buffer_until() to force the entire file into a
buffer. And then here, you copy the entire file into a second buffer
which is exactly the same except that it's guaranteed to be properly
aligned. It would be possible to include a ControlFileData in
astreamer_verify and copy the bytes directly into it (you'd need a
second astreamer_verify field for the number of bytes already copied
into that structure). I'm not 100% sure that's worth the code, but it
seems like it wouldn't take more than a few lines, so perhaps it is.

+/*
+ * Verify plain backup.
+ */
+static void
+verify_plain_backup(verifier_context *context)
+{
+ Assert(context->format == 'p');
+ verify_backup_directory(context, NULL, context->backup_directory);
+}
+

This seems like a waste of space.

+verify_tar_backup(verifier_context *context)

I like this a lot better now! I'm still not quite sure about the
decision to have the ignore list apply to both the backup directory
and the tar file contents -- but given the low participation on this
thread, I don't think we have much chance of getting feedback from
anyone else right now, so let's just do it the way you have it and we
can change it later if someone shows up to complain.

+verify_all_tar_files(verifier_context *context, SimplePtrList *tarfiles)

I think this code could be moved into its only caller instead of
having a separate function. And then if you do that, maybe
verify_one_tar_file could be renamed to just verify_tar_file. Or
perhaps that function could also be removed and just move the code
into the caller. It's not much code and not very deeply nested.
Similarly create_archive_verifier() could be considered for this
treatment. Maybe inlining all of these is too much and the result will
look messy, but I think we should at least try to combine some of
them.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-09-09 20:02:37 Re: Opinion poll: Sending an automated email to a thread when it gets added to the commitfest
Previous Message Andrew Dunstan 2024-09-09 19:54:52 Re: Jargon and acronyms on this mailing list