Re: pg_combinebackup does not detect missing files

From: David Steele <david(at)pgmasters(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_combinebackup does not detect missing files
Date: 2024-08-05 03:58:44
Message-ID: 1a49715d-79b9-44c4-a097-63ecc00698b3@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8/2/24 20:37, Robert Haas wrote:
> On Fri, Apr 19, 2024 at 11:47 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Hmm, that's an interesting perspective. I've always been very
>> skeptical of doing verification only around missing files and not
>> anything else. I figured that wouldn't be particularly meaningful, and
>> that's pretty much the only kind of validation that's even
>> theoretically possible without a bunch of extra overhead, since we
>> compute checksums on entire files rather than, say, individual blocks.
>> And you could really only do it for the final backup in the chain,
>> because you should end up accessing all of those files, but the same
>> is not true for the predecessor backups. So it's a very weak form of
>> verification.
>>
>> But I looked into it and I think you're correct that, if you restrict
>> the scope in the way that you suggest, we can do it without much
>> additional code, or much additional run-time. The cost is basically
>> that, instead of only looking for a backup_manifest entry when we
>> think we can reuse its checksum, we need to do a lookup for every
>> single file in the final input directory. Then, after processing all
>> such files, we need to iterate over the hash table one more time and
>> see what files were never touched. That seems like an acceptably low
>> cost to me. So, here's a patch.
>>
>> I do think there's some chance that this will encourage people to
>> believe that pg_combinebackup is better at finding problems than it
>> really is or ever will be, and I also question whether it's right to
>> keep changing stuff after feature freeze. But I have a feeling most
>> people here are going to think this is worth including in 17. Let's
>> see what others say.
>
> There was no hue and cry to include this in v17 and I think that ship
> has sailed at this point, but we could still choose to include this as
> an enhancement for v18 if people want it. I think David's probably in
> favor of that (but I'm not 100% sure) and I have mixed feelings about
> it (explained above) so what I'd really like is some other opinions on
> whether this idea is good, bad, or indifferent.

I'm still in favor but if nobody else is interested then I'm not going
to push on it.

Regards,
-David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2024-08-05 04:15:23 Re: Conflict detection and logging in logical replication
Previous Message Amit Kapila 2024-08-05 03:48:55 Re: Conflict detection and logging in logical replication