Re: pg_combinebackup does not detect missing files

From: David Steele <david(at)pgmasters(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_combinebackup does not detect missing files
Date: 2024-05-17 05:18:18
Message-ID: 2f9aeae6-c010-43a5-b456-91adeb229160@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/25/24 00:05, Robert Haas wrote:
> On Tue, Apr 23, 2024 at 7:23 PM David Steele <david(at)pgmasters(dot)net> wrote:
>>> I don't understand what you mean here. I thought we were in agreement
>>> that verifying contents would cost a lot more. The verification that
>>> we can actually do without much cost can only check for missing files
>>> in the most recent backup, which is quite weak. pg_verifybackup is
>>> available if you want more comprehensive verification and you're
>>> willing to pay the cost of it.
>>
>> I simply meant that it is *possible* to verify the output of
>> pg_combinebackup without explicitly verifying all the backups. There
>> would be overhead, yes, but it would be less than verifying each backup
>> individually. For my 2c that efficiency would make it worth doing
>> verification in pg_combinebackup, with perhaps a switch to turn it off
>> if the user is confident in their sources.
>
> Hmm, can you outline the algorithm that you have in mind? I feel we've
> misunderstood each other a time or two already on this topic, and I'd
> like to avoid more of that. Unless you just mean what the patch I
> posted does (check if anything from the final manifest is missing from
> the corresponding directory), but that doesn't seem like verifying the
> output.

Yeah, it seems you are right that it is not possible to verify the
output in all cases.

However, I think allowing the user to optionally validate the input
would be a good feature. Running pg_verifybackup as a separate step is
going to be a more expensive then verifying/copying at the same time.
Even with storage tricks to copy ranges of data, pg_combinebackup is
going to aware of files that do not need to be verified for the current
operation, e.g. old copies of free space maps.

Additionally, if pg_combinebackup is updated to work against tar.gz,
which I believe will be important going forward, then there would be
little penalty to verification since all the required data would be in
memory at some point anyway. Though, if the file is compressed it might
be redundant since compression formats generally include checksums.

One more thing occurs to me -- if data checksums are enabled then a
rough and ready output verification would be to test the checksums
during combine. Data checksums aren't very good but something should be
triggered if a bunch of pages go wrong, especially since the block
offset is part of the checksum. This would be helpful for catching
combine bugs.

Regards,
-David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-05-17 05:24:52 Re: Why does pgindent's README say to download typedefs.list from the buildfarm?
Previous Message Pavel Stehule 2024-05-17 05:09:12 Re: Schema variables - new implementation for Postgres 15