Quick Links

Re: backup manifests

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Tels <nospam-pg-abuse(at)bloodgate(dot)com>, David Steele <david(at)pgmasters(dot)net>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject:	Re: backup manifests
Date:	2020-03-26 16:34:52
Message-ID:	20200326163452.GW13712@tamriel.snowman.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Greetings,

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> On Wed, Mar 25, 2020 at 4:54 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > > That's a fair argument, but I think the other relevant principle is
> > > that we try to give people useful defaults for things. I think that
> > > checksums are a sufficiently useful thing that having the default be
> > > not to do it doesn't make sense. I had the impression that you and
> > > David were in agreement on that point, actually.
> >
> > I agree with wanting to have useful defaults and that checksums should
> > be included by default, and I'm alright even with letting people pick
> > what algorithms they'd like to have too. The construct here is made odd
> > because we've got this idea that "no checksum" is an option, which is
> > actually something that I don't particularly like, but that's what's
> > making this particular syntax weird. I don't suppose you'd be open to
> > the idea of just dropping that though..? There wouldn't be any issue
> > with this syntax if we just always had checksums included when a
> > manifest is requested. :)
> >
> > Somehow, I don't think I'm going to win that argument.
>
> Well, it's not a crazy idea. So, at some point, I had the idea that
> you were always going to get a manifest, and therefore you should at
> least ought to have the option of not checksumming to avoid the
> overhead. But, as things stand now, you can suppress the manifest
> altogether, so that you can still take a backup even if you've got no
> disk space to spool the manifest on the master. So, if you really want
> no overhead from manifests, just don't have a manifest. And if you are
> OK with some overhead, why not at least have a CRC-32C checksum, which
> is, after all, pretty cheap?
>
> Now, on the other hand, I don't have any strong evidence that the
> manifest-without-checksums mode is useless. You can still use it to
> verify that you have the correct files and that those files have the
> expected sizes. And, verifying those things is very cheap, because you
> only need to stat() each file, not open and read them all. True, you
> can do those things by using pg_validatebackup -s. But, you'd still
> incur the (admittedly fairly low) overhead of computing checksums that
> you don't intend to use.
>
> This is where I feel like I'm trying to make decisions in a vacuum. If
> we had a few more people weighing in on the thread on this point, I'd
> be happy to go with whatever the consensus was. If most people think
> having both --no-manifest (suppressing the manifest completely) and
> --manifest-checksums=none (suppressing only the checksums) is useless
> and confusing, then sure, let's rip the latter one out. If most people
> like the flexibility, let's keep it: it's already implemented and
> tested. But I hate to base the decision on what one or two people
> think.

I'm frustrated at the lack of involvement from others also.

Just to be clear- I'm not completely against having a 'manifest but no
checksum' option, but if that's what we're going to have then it seems
like the syntax should be such that if you don't specify checksums then
you don't get checksums and "MANIFEST_CHECKSUM none" shouldn't be a
thing.

All that said, as I said up-thread, I appreciate that we aren't
designing SQL here and that this is pretty special syntax to begin with,
so if you ended up committing it the way you have it now, so be it, I
wouldn't be asking for it to be reverted over this. It's a bit awkward
and kind of a thorn, but it's not entirely unreasonable, and we'd
probably end up there anyway if we started out without a 'none' option
and someone did come up with a good argument and a patch to add such an
option in the future.

> > > Well, the 512MB "limit" for CRC-32C means only that for certain very
> > > specific types of errors, detection is not guaranteed above that file
> > > size. So if you have a single flipped bit, for example, and the file
> > > size is greater than 512MB, then CRC-32C has only a 99.9999999767169%
> > > chance of detecting the error, whereas if the file size is less than
> > > 512MB, it is 100% certain, because of the design of the algorithm. But
> > > nine nines is plenty, and neither SHA nor our page-level checksums
> > > provide guaranteed error detection properties anyway.
> >
> > Right, so we know that CRC-32C has an upper-bound of 512MB to be useful
> > for exactly what it's designed to be useful for, but we also know that
> > we're going to have larger files- at least 1GB ones, and quite possibly
> > larger, so why are we choosing this?
> >
> > At the least, wouldn't it make sense to consider a larger CRC, one whose
> > limit is above the size of commonly expected files, if we're going to
> > use a CRC?
>
> I mean, you're just repeating the same argument here, and it's just
> not valid. Regardless of the file size, the chances of a false
> checksum match are literally less than one in a billion. There is
> every reason to believe that users will be happy with a low-overhead
> method that has a 99.9999999+% chance of detecting corrupt files. I do
> agree that a 64-bit CRC would probably be not much more expensive and
> improve the probability of detecting errors even further, but I wanted
> to restrict this patch to using infrastructure we already have. The
> choices there are the various SHA functions (so I supported those),
> MD5 (which I deliberately omitted, for reasons I hope you'll be the
> first to agree with), CRC-32C (which is fast), a couple of other
> CRC-32 variants (which I omitted because they seemed redundant and one
> of them only ever existed in PostgreSQL because of a coding mistake),
> and the hacked-up version of FNV that we use for page-level checksums
> (which is only 16 bits and seems to have no advantages for this
> purpose).

The argument that "well, we happened to already have it, even though we
used it for much smaller data sets, which are well within the
100%-single-bit-error detection limit" certainly doesn't make me be in
more support of this. Choosing the right algorithm to use maybe
shouldn't be based on the age of that algorithm, but it also certainly
shouldn't be "just because we already have it" when we're using it for a
very different use-case.

I'm guessing folks have already seen it, but I thought this was an
interesting run-down of actual collisions based on various checksum
lengths using one data set (though it's not clear exactly how big it is,
from what I can see)-

http://www.backplane.com/matt/crc64.html

I do agree with excluding things like md5 and others that aren't good
options. I wasn't saying we should necessarily exclude crc32c either..
but rather saying that it shouldn't be the default.

Here's another way to look at it- where do we use crc32c today, and how
much data might we possibly be covering with that crc? Why was crc32c
picked for that purpose? If the individual who decided to pick crc32c
for that case was contemplating a checksum for up-to-1GB files, would
they have picked crc32c? Seems unlikely to me.

> > > I'm not sure why the fact that it's a 40-year-old algorithm is
> > > relevant. There are many 40-year-old algorithms that are very good.
> >
> > Sure there are, but there probably wasn't a lot of thought about
> > GB-sized files, and this doesn't really seem to be the direction people
> > are going in for larger objects. s3, as an example, uses sha256.
> > Google, it seems, suggests folks use "HighwayHash" (from their crc32c
> > github repo- https://github.com/google/crc32c) Most CRC uses seem to
> > be for much smaller data sets.
>
> Again, I really want to stick with infrastructure we already have.

I don't agree with that as a sensible justification for picking it for
this case, because it's clearly not the same use-case.

> Trying to find a hash function that will please everybody is a hole
> with no bottom, or more to the point, a bikeshed in need of painting.
> There are TONS of great hash functions out there on the Internet, and
> as previous discussions of pgsql-hackers will attest, as soon as you
> go down that road, somebody will say "well, what about xxhash" or
> whatever, and then you spend the rest of your life trying to figure
> out what hash function we could try to commit that is fast and secure
> and doesn't have copyright or patent problems. There have been
> multiple efforts to introduce such hash functions in the past, and I
> think basically all of those have crashed into a brick wall.
>
> I don't think that's because introducing new hash functions is a bad
> idea. I think that there are various reasons why it might be a good
> idea. For instance, highwayhash purports to be a cryptographic hash
> function that is fast enough to replace non-cryptographic hash
> functions. It's easy to see why someone might want that, here. For
> example, it would be entirely reasonable to copy the backup manifest
> onto a USB key and store it in a vault. Later, if you get the USB key
> back out of the vault and validate it against the backup, you pretty
> much know that none of the data files have been tampered with,
> provided that you used a cryptographic hash. So, SHA is a good option
> for people who have a USB key and a vault, and a faster cryptographic
> might be even better. I don't have any desire to block such proposals,
> and I would be thrilled if this work inspires other people to add such
> options. However, I also don't want this patch to get blocked by an
> interminable argument about which hash functions we ought to use. The
> ones we have in core now are good enough for a start, and more can be
> added later.

I'm not actually argueing about which hash functions we should support,
but rather what the default is and if crc32c, specifically, is actually
a reasonable choice. Just because it's fast and we already had an
implementation of it doesn't justify its use as the default. Given that
it doesn't actually provide the check that is generally expected of
CRC checksums (100% detection of single-bit errors) when the file size
gets over 512MB makes me wonder if we should have it at all, yes, but it
definitely makes me think it shouldn't be our default.

Folks look to PG as being pretty good at figuring things out and doing
the thing that makes sense to minimize risk of data loss or corruption.
I can understand and agree with the desire to have a faster alternative
to sha256 for those who don't need a cryptographically safe hash, but if
we're going to provide that option, it should be the right answer and
it's pretty clear, at least to me, that crc32c isn't a good choice for
gigabyte-size files.

> > Sure, there's a good chance we'll need newer algorithms in the future, I
> > don't doubt that. On the other hand, if crc32c, or CRC whatever, was
> > the perfect answer and no one will ever need something better, then
> > what's with folks like Google suggesting something else..?
>
> I have never said that CRC was the perfect answer, and the reason why
> Google is suggesting something different is because they wanted a fast
> hash (not SHA) that still has cryptographic properties. What I have
> said is that using CRC-32C by default means that there is very little
> downside as compared with current releases. Backups will not get
> slower, and error detection will get better. If you pick any other
> default from the menu of options currently available, then either
> backups get noticeably slower, or we get less error detection
> capability than that option gives us.

I don't agree with limiting our view to only those algorithms that we've
already got implemented in PG.

> > As for folks who are that close to the edge on their backup timing that
> > they can't have it slow down- chances are pretty darn good that they're
> > not far from ending up needing to find a better solution than
> > pg_basebackup anyway. Or they don't need to generate a manifest (or, I
> > suppose, they could have one but not have checksums..).
>
> 40-50% is a lot more than "if you were on the edge."

We can agree to disagree on this, it's not particularly relevant in the
end.

> > > Well, that'd be wrong, though. It's true that backup_manifest won't
> > > have an entry in the manifest, and neither will WAL files, but
> > > postgresql.auto.conf will. We'll just skip complaining about it if the
> > > checksum doesn't match or whatever. The server generates manifest
> > > entries for everything, and the client decides not to pay attention to
> > > some of them because it knows that pg_basebackup may have made certain
> > > changes that were not known to the server.
> >
> > Ok, but it's also wrong to say that the backup_label is excluded from
> > verification.
>
> The docs don't say that backup_label is excluded from verification.
> They do say that backup_manifest is excluded from verification
> *against the manifest*, because it is. I'm not sure if you're honestly
> confused here or if we're just devolving into arguing for the sake of
> argument, but right now the code looks like this:

That you're bringing up code here is really just not sensible- we're
talking about the documentation, not about the code here. I do
understand what the code is doing and I don't have any complaint about
the code.

> Oops. If you read that error carefully, you can see that the complaint
> is 100% valid. backup_manifest is indeed present on disk, but not in
> the manifest. However, because this situation is expected and known
> not to be a problem, the right thing to do is suppress the error. That
> is why it is in the ignore_list by default. The documentation is
> attempting to explain this. If it's unclear, we should try to make it
> better, but it is absolutely NOT saying that there is no internal
> validation of the backup_manifest. In fact, the previous paragraph
> tries to explain that:

Yes, I think the documentation is unclear, as I said before, because it
purports to list things that aren't being validated and then includes
backup_manifest in that list, which doesn't make sense. The sentence in
question does *not* say "Certain files and directories are excluded from
the manifest" (which is wording that I actually proposed up-thread, to
try to address this...), it says, from the patch:

"Certain files and directories are excluded from verification:"

Excluded from verification. Then lists backup_manifest. Even though,
earlier in that same paragraph it says that the manifest is verified
against its own checksum.

> + <application>pg_validatebackup</application> reads the manifest file of a
> + backup, verifies the manifest against its own internal checksum, and then
>
> It is, however, saying, and *entirely correctly*, that
> pg_validatebackup will not check the backup_manifest file against the
> backup_manifest. If it did, it would find that it's not there. It
> would then emit an error message like the one above even though
> there's no problem with the backup.

It's saying, removing the listing aspect, exactly that "backup_label is
excluded from verification". That's what I am taking issue with. I've
made multiple attempts to suggest other language to avoid saying that
because it's clearly wrong- the manifest is verified.

> > I fail to see the usefulness of a tool that doesn't actually verify that
> > the backup is able to be restored from.
> >
> > Even pg_basebackup (in both fetch and stream modes...) checks that we at
> > least got all the WAL that's needed for the backup from the server
> > before considering the backup to be valid and telling the user that
> > there was a successful backup. With what you're proposing here, we
> > could have someone do a pg_basebackup, get back an ERROR saying the
> > backup wasn't valid, and then run pg_validatebackup and be told that the
> > backup is valid. I don't get how that's sensible.
>
> I'm sorry that you can't see how that's sensible, but it doesn't mean
> that it isn't sensible. It is totally unrealistic to expect that any
> backup verification tool can verify that you won't get an error when
> trying to use the backup. That would require that everything that the
> validation tool try to do everything that PostgreSQL will try to do
> when the backup is used, including running recovery and updating the
> data files. Anything less than that creates a real possibility that
> the backup will verify good but fail when used. This tool has a much
> narrower purpose, which is to try to verify that we (still) have the
> files the server sent as part of the backup and that, to the best of
> our ability to detect such things, they have not been modified. As you
> know, or should know, the WAL files are not sent as part of the
> backup, and so are not verified. Other things that would also be
> useful to check are also not verified. It would be fantastic to have
> more verification tools in the future, but it is difficult to see why
> anyone would bother trying if an attempt to get the first one
> committed gets blocked because it does not yet do everything. Very few
> patches try to do everything, and those that do usually get blocked
> because, by trying to do too much, they get some of it badly wrong.

I'm not talking about making sure that no error ever happens when doing
a restore of a particular backup. You're arguing against something that
I have not advocated for and which I don't advocate for.

I'm saying that the existing tool that takes the backup has a *really*
*important* verification check that this proposed "validate backup" tool
doesn't have, and that isn't sensible. It leads to situations where the
backup tool itself, pg_basebackup, can fail or be killed before it's
actually completed, and the "validate backup" tool would say that the
backup is perfectly fine. That is not sensible.

That there might be other reasons why a backup can't be restored isn't
relevant and I'm not asking for a tool that is perfect and does some
kind of proof that the backup is able to be restored.

Thanks,

Stephen

In response to

Re: backup manifests at 2020-03-26 15:37:48 from Robert Haas

Responses

Re: backup manifests at 2020-03-26 17:40:55 from Mark Dilger
Re: backup manifests at 2020-03-26 18:02:29 from Robert Haas
Re: backup manifests at 2020-03-27 22:36:17 from Bruce Momjian

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2020-03-26 16:40:40	Re: Make mesage at end-of-recovery less scary.
Previous Message	Gilles Darold	2020-03-26 16:09:04	Re: [PATCH] Fix CommitTransactionCommand() to CallXactCallbacks() in TBLOCK_ABORT_END