Re: pg_verifybackup: TAR format backup verification

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Amul Sul <sulamul(at)gmail(dot)com>, Sravan Kumar <sravanvcybage(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_verifybackup: TAR format backup verification
Date: 2024-09-30 15:06:57
Message-ID: CA+TgmoYHyUxkTW4CdwpdVsB4V-Mq+YWs7FwEsuSJq057+fSVoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 28, 2024 at 6:59 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Now, manifest_file.size is in fact a size_t, so %zu is the correct
> format spec for it. But astreamer_verify.checksum_bytes is declared
> uint64. This leads me to question whether size_t was the correct
> choice for manifest_file.size. If it is, is it OK to use size_t
> for checksum_bytes as well? If not, your best bet is likely
> to write %lld and add an explicit cast to long long, as we do in
> dozens of other places. I see that these messages are intended to be
> translatable, so INT64_FORMAT is not usable here.

It seems that manifest_file.size is size_t because parse_manifest.h is
using size_t for json_manifest_per_file_callback. What's happening is
that json_manifest_finalize_file() is parsing the file size
information out of the manifest. It uses strtoul to do that and
assigns the result to a size_t. I don't think I had any principled
reason for making that decision; size_t is, I think, the size of an
object in memory, and this is not that. This is just a string in a
JSON file that represents an integer which will hopefully turn out to
be the size of the file on disk. I guess I don't know what type I
should be using here. Most things in PostgreSQL use a type like uint32
or uint64, but technically what we're going to be comparing against in
the end is probably an off_t produced by stat(), but the return value
of strtoul() or strtoull() is unsigned long or unsigned long long,
which is not any of those things. If you have a suggestion here, I'm
all ears.

> Aside from that, I'm unimpressed with expending a five-line comment
> at line 376 to justify casting control_file_bytes to int, when you
> could similarly cast it to long long, avoiding the need to justify
> something that's not even in tune with project style.

I don't know what you mean by this. The comment explains that I used
%d here because that's what pg_rewind does, and %d corresponds to int,
not long long. If you think it should be some other way, you can
change it, and perhaps you'd like to change pg_rewind to match while
you're at it. But the fact that there's a comment here explaining the
reasoning is a feature, not a bug. It's weird to me to get criticized
for failing to follow project style when I literally copied something
that already exists.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-09-30 15:11:26 Re: msys inet_pton strangeness
Previous Message Tom Lane 2024-09-30 15:05:36 Re: Do not lock temp relations