| From: | David Steele <david(at)pgmasters(dot)net> | 
|---|---|
| To: | Robert Haas <robertmhaas(at)gmail(dot)com> | 
| Cc: | Stephen Frost <sfrost(at)snowman(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Tels <nospam-pg-abuse(at)bloodgate(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com> | 
| Subject: | Re: backup manifests | 
| Date: | 2020-03-27 19:50:56 | 
| Message-ID: | 4507b6d4-09a5-f30d-a575-9db539daddb1@pgmasters.net | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On 3/27/20 1:53 PM, Robert Haas wrote:
> On Thu, Mar 26, 2020 at 4:37 PM David Steele <david(at)pgmasters(dot)net> wrote:
>> I know you and Stephen have agreed on a number of doc changes, would it
>> be possible to get a new patch with those included? I finally have time
>> to do a review of this tomorrow.  I saw some mistakes in the docs in the
>> current patch but I know those patches are not current.
> 
> Hi David,
> 
> Here's a new version with some fixes:
> 
> - Fixes for doc typos noted by Stephen Frost and Andres Freund.
> - Replace a doc paragraph about the advantages and disadvantages of
> CRC-32C with one by Stephen Frost, with a slightly change by me that I
> thought made it sound more grammatical.
> - Change the pg_validatebackup documentation so that it makes no
> mention of compatible tools, per Stephen.
> - Reword the discussion of the exclude list in the pg_validatebackup
> documentation, per discussion between Stephen and myself.
> - Try to make the documentation more clear about the fact that we
> check for both extra and missing files.
> - Incorporate a fix from Amit Kapila to make 003_corruption.pl pass on Windows.
Thanks!
There appear to be conflicts with 67e0adfb3f98:
$ git apply -3 
../download/v14-0002-Generate-backup-manifests-for-base-backups-and-v.patch
../download/v14-0002-Generate-backup-manifests-for-base-backups-and-v.patch:3396: 
trailing whitespace.
sub cleanup_search_directory_fails
error: patch failed: src/backend/replication/basebackup.c:258
Falling back to three-way merge...
Applied patch to 'src/backend/replication/basebackup.c' with conflicts.
U src/backend/replication/basebackup.c
warning: 1 line adds whitespace errors.
 > +          Specifies the algorithm that should be used to checksum 
each file
 > +          for purposes of the backup manifest. Currently, the available
perhaps "for inclusion in the backup manifest"?  Anyway, I think this 
sentence is awkward.
 > +        Specifies the algorithm that should be used to checksum each 
file
 > +        for purposes of the backup manifest. Currently, the available
And again.
> + because the files themselves do not need to read.
should be "need to be read".
 > +        the manifest itself will always contain a 
<literal>SHA256</literal>
I think just "the manifest will always contain" is fine.
 > +        manifeste itself, and is therefore ignored. Note that the 
manifest
typo "manifeste", perhaps remove itself.
 > { "Path": "backup_label", "Size": 224, "Last-Modified": "2020-03-27 
18:33:18 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "b914bec9" },
Storing the checksum type with each file seems pretty redundant. 
Perhaps that could go in the header?  You could always override if a 
specific file had a different checksum type, though that seems unlikely.
In general it might be good to go with shorter keys: "mod", "chk", etc. 
Manifests can get pretty big and that's a lot of extra bytes.
I'm also partial to using epoch time in the manifest because it is 
generally easier for programs to work with.  But, human-readable doesn't 
suck, either.
 >  	if (maxrate > 0)
 > 		maxrate_clause = psprintf("MAX_RATE %u", maxrate);
 > +	if (manifest)
A linefeed here would be nice.
> + manifestfile *tabent;
This is an odd name. A holdover from the tab-delimited version?
> + printf(_("Usage:\n %s [OPTION]... BACKUPDIR\n\n"), progname);
When I ran pg_validatebackup I expected to use -D to specify the backup 
dir since pg_basebackup does.  On the other hand -D is weird because I 
*really* expect that to be the pg data dir.
But, do we want this to be different from pg_basebackup?
> + checksum_length = checksum_string_length / 2;
This check is defeated if a single character is added the to checksum.
Not too big a deal since you still get an error, but still.
> + * Verify that the manifest checksum is correct.
This is not working the way I would expect -- I could freely modify the 
manifest without getting a checksum error on the manifest.  For example:
$ /home/vagrant/test/pg/bin/pg_validatebackup test/backup3
pg_validatebackup: fatal: invalid checksum for file "backup_label": 
"408901e0814f40f8ceb7796309a59c7248458325a21941e7c55568e381f53831?"
So, if I deleted the entry above, I got a manifest checksum error.  But 
if I just modified the checksum I get a file checksum error with no 
manifest checksum error.
I would prefer a manifest checksum error in all cases where it is wrong, 
unless --exit-on-error is specified.
-- 
-David
david(at)pgmasters(dot)net
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Alvaro Herrera | 2020-03-27 19:51:49 | Re: allow online change primary_conninfo | 
| Previous Message | Stephen Frost | 2020-03-27 19:48:50 | Re: backup manifests |