Re: pg_verifybackup: TAR format backup verification

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amul Sul <sulamul(at)gmail(dot)com>
Cc: Sravan Kumar <sravanvcybage(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: pg_verifybackup: TAR format backup verification
Date: 2024-08-06 17:08:57
Message-ID: CA+Tgmoboyugvh0C-QWfzQK0nNgOHrxGnQQiOROjaa--+Aqr2xg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 1, 2024 at 9:19 AM Amul Sul <sulamul(at)gmail(dot)com> wrote:
> > I think I would have made this pass context->show_progress to
> > progress_report() instead of the whole verifier_context, but that's an
> > arguable stylistic choice, so I'll defer to you if you prefer it the
> > way you have it. Other than that, this LGTM.
>
> Additionally, I moved total_size and done_size to verifier_context
> because done_size needs to be accessed in astreamer_verify.c.
> With this change, verifier_context is now more suitable.

But it seems like 0006 now changes the logic for computing total_size.
Prepatch, the condition is:

- if (context->show_progress && !context->skip_checksums &&
- should_verify_checksum(m))
- context->total_size += m->size;

where should_verify_checksum(m) checks (((m)->matched) && !((m)->bad)
&& (((m)->checksum_type) != CHECKSUM_TYPE_NONE)). But post-patch the
condition is:

+ if (!context.skip_checksums)
...
+ if (!should_ignore_relpath(context, m->pathname))
+ total_size += m->size;

The old logic was reached from verify_backup_directory() which does
check should_ignore_relpath(), so the new condition hasn't added
anything. But it seems to have lost the show_progress condition, and
the m->checksum_type != CHECKSUM_TYPE_NONE condition. I think this
means that we'll sum the sizes even when not displaying progress, and
that if some of the files in the manifest had no checksums, our
progress reporting would compute wrong percentages after the patch.

> Understood. At the start of working on the v3 review, I thought of
> completely discarding the 0007 patch and copying most of
> verify_file_checksum() to a new function in astreamer_verify.c.
> However, I later realized we could deduplicate some parts, so I split
> verify_file_checksum() and moved the reusable part to a separate
> function. Please have a look at v4-0007.

Yeah, that seems OK.

The fact that these patches don't have commit messages is making life
more difficult for me than it needs to be. In particular, I'm looking
at 0009 and there's no hint about why you want to do this. In fact
that's the case for all of these refactoring patches. Instead of
saying something like "tar format verification will want to verify the
control file, but will not be able to read the file directly from
disk, so separate the logic that reads the control file from the logic
that verifies it" you just say what code you moved. Then I have to
guess why you moved it, or flip back and forth between the refactoring
patch and 0011 to try to figure it out. It would be nice if each of
these refactoring patches contained a clear indication about the
purpose of the refactoring in the commit message.

> I had the same thought about checking for NULL inside
> should_verify_control_data(), but I wanted to maintain the structure
> similar to should_verify_checksum(). Making this change would have
> also required altering should_verify_checksum(), I wasn’t sure if I
> should make that change before. Now, I did that in the attached
> version -- 0008 patch.

I believe there is no reason for this change to be part of 0008 at
all, and that this should be part of whatever later patch needs it.

> > Maybe think of doing something with the ASTREAMER_MEMBER_HEADER case also.
>
> Done.

OK, the formatting of 0011 looks much better now.

It seems to me that 0011 is arranging to palloc the checksum context
for every file and then pfree it at the end. It seems like it would be
considerably more efficient if astreamer_verify contained a
pg_checksum_context instead of a pointer to a pg_checksum_context. If
you need a flag to indicate whether we've reinitialized the checksum
for the current file, it's better to add that than to have all of
these unnecessary allocate/free cycles.

Existing astreamer code uses struct member names_like_this. For the
new one, you mostly used namesLikeThis except when you used
names_like_this or namesLkThs.

It seems to me that instead of adding a global variable
verify_backup_file_cb, it would be better to move the 'format'
variable into verifier_context. Then you can do something like if
(context->format == 'p') verify_plain_backup_file() else
verify_tar_backup_file().

It's pretty common for .tar.gz to be abbreviated to .tgz. I think we
should support that.

Let's suppose that I have a backup which, for some reason, does not
use the same compression for all files (base.tar, 16384.tgz,
16385.tar.gz, 16366.tar.lz4). With this patch, that will fail. Now,
that's not really a problem, because having a backup with mixed
compression algorithms like that is strange and you probably wouldn't
try to do it. But on the other hand, it looks to me like making the
code support that would be more elegant than what you have now.
Because, right now, you have code to detect what type of backup you've
got by looking at base.WHATEVER_EXTENSION ... but then you have to
also have code that complains if some later file doesn't have the same
extension. But you could just detect the type of every file
individually.

In fact, I wonder if we even need -Z. What value is that actually
providing? Why not just always auto-detect?

find_backup_format() ignores the possibility of stat() throwing an
error. That's not good.

Suppose that the backup directory contains main.tar, 16385.tar, and
snuffleupagus.tar. It looks to me like what will happen here is that
we'll verify main.tar with tblspc_oid = InvalidOid, 16385.tar with
tblspc_oid = 16385, and snuffleupagus.tar with tblspc_oid =
InvalidOid. That doesn't sound right. I think we should either
completely ignore snuffleupagus.tar just as it were completely
imaginary, or perhaps there's an argument for emitting a warning
saying that we weren't expecting a snuffleupagus to exist.

In general, I think all unexpected files in a tar-format backup
directory should get the same treatment, regardless of whether the
problem is with the extension or the file itself. We should either
silently ignore everything that isn't expected to be present, or we
should emit a complaint saying that the file isn't expected to be
present. Right now, you say that it's "not a valid file" if the
extension isn't what you expect (which doesn't seem like a good error
message, because the file may be perfectly valid for what it is, it's
just not a file we're expecting to see) and say nothing if the
extension is right but the part of the filename preceding the
extension is unexpected.

A related issue is that it's a little unclear what --ignore is
supposed to do for tar-format backups. Does that ignore files in the
backup directory, or files instead of the tar files inside of the
backup directory? If we decide that --ignore ignores files in the
backup directory, then we should complain about any unexpected files
that are present there unless they've been ignored. If we decide that
--ignore ignores files inside of the tar files, then I suggest we just
silently skip any files in the backup directory that don't seem to
have file names in the correct format. I think I prefer the latter
approach, but I'm not 100% sure what's best.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo Nagata 2024-08-06 17:13:04 Re: Inconsistency with EXPLAIN ANALYZE CREATE MATERIALIZED VIEW
Previous Message Nathan Bossart 2024-08-06 17:01:51 Re: [PATCH] Add crc32(text) & crc32(bytea)