Re: backup manifests

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Tels <nospam-pg-abuse(at)bloodgate(dot)com>, David Steele <david(at)pgmasters(dot)net>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: backup manifests
Date: 2020-03-27 19:20:27
Message-ID: CA+TgmoZBhohTrORC3jUFXB8K1BJh6ojxmXdLp8WKWsX165DwWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 27, 2020 at 2:29 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> s/ye'/yes/

Ugh, sorry. Fixed in the version posted earlier.

> Are you planning to include a specification of the manifest file format
> anywhere? I looked through the patches and didn't find anything.

I thought about that. I think it would be good to have. I was sort of
hoping to leave it for a follow-on patch, but maybe that's cheating
too much.

> I think it'd also be good to include more information about what the
> point of manifest files actually is.

What kind of information do you want to see included there? Basically,
the way the documentation is written right now, it essentially says,
well, we have this manifest thing so that you can later run
pg_validatebackup, and pg_validatebackup says that it's there to check
the integrity of backups using the manifest. This is all a bit
circular, though, and maybe needs elaboration.

What I've experienced is that:

- Sometimes people take a backup and then wonder later whether the
disk has flipped some bits.
- Sometimes people restore a backup and forget some of the parts, like
the user-defined tablespaces.
- Sometimes anti-virus software, or poorly-run cron job run amok,
wander around inflicting unpredictable damage.

It would be nice to have a system that would notice these kinds of
things on a running system, but here I've got the more modest goal of
checking for in the context of a backup. If the data gets corrupted in
transit, or if the disk mutilates it, or if the user mutilates it, you
need something to check the backup against to find out that bad things
have happend; the manifest is that thing. But I don't know exactly how
much of all that should go in the docs, or in what way.

> > + <para>
> > + <application>pg_validatebackup</application> reads the manifest file of a
> > + backup, verifies the manifest against its own internal checksum, and then
> > + verifies that the same files are present in the target directory as in the
> > + manifest itself. It then verifies that each file has the expected checksum,
>
> Depending on what you want to use the manifest for, we'd also need to
> check that there are no additional files. That seems to actually be
> implemented, which imo should be mentioned here.

I intended the text to say that, because it says that it checks that
the two things are "the same," which is symmetric. In the new version
I posted a bit ago, I tried to make it more explicit, because
apparently it was not sufficiently clear.

> Hm. Is it a great choice to include the checksum for the manifest inside
> the manifest itself? With a cryptographic checksum it seems like it
> could make a ton of sense to store the checksum somewhere "safe", but
> keep the manifest itself alongside the base backup itself. While not
> huge, they won't be tiny either.

Seems like the user could just copy the manifest checksum and store it
somewhere, if they wish. Then they can check it against the manifest
itself later, if they wish. Or they can take a SHA-512 of the whole
file and store that securely. The problem is that we have no idea how
to write that checksum to a more security storage. We could write
backup_manifest and backup_manifest.checksum into separate files, but
that seems like it's adding complexity without any real benefit.

To me, the security-related uses of this patch seem to be fairly
niche. I think it's nice that they exist, but I don't think that's the
main selling point. For me, the main selling point is that you can
check that your disk didn't eat your data and that nobody nuked any
files that were supposed to be there.

> Doesn't have to be in the first version, but could it be useful to move
> this to common/ or such?

Yeah. At one point, this code was written in a way that was totally
specific to pg_validatebackup, but I then realized that it would be
better to make it more general, so I refactored it into in the form
you see now, where pg_validatebackup.c depends on parse_manifest.c but
not the reverse. I suspect that if someone wants to use this for
something else they might need to change a few more things - not sure
exactly what - but I don't think it would be too hard. I thought it
would be best to leave that task until someone has a concrete use case
in mind, but I did want it to to be relatively easy to do that down
the road, and I hope that the way I've organized the code achieves
that.

> > +static void
> > +validate_backup_directory(validator_context *context, char *relpath,
> > + char *fullpath)
> > +{
>
> Hm. Should this warn if the directory's permissions are set too openly
> (world writable?)?

I don't think so, but it's pretty clear that different people have
different ideas about what the scope of this tool ought to be, even in
this first version.

> Hm. I think it'd be good to verify that the checksummed size is the same
> as the size of the file in the manifest.

That's checked in an earlier phase. Are you worried about the file
being modified after the first pass checks the size and before we come
through to do the checksumming?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kirill Bychik 2020-03-27 19:21:48 Re: WAL usage calculation patch
Previous Message Tom Lane 2020-03-27 19:19:59 Re: A bug when use get_bit() function for a long bytea string