Re: documenting the backup manifest file format

From: David Steele <david(at)pgmasters(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Tels <nospam-pg-abuse(at)bloodgate(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: documenting the backup manifest file format
Date: 2020-04-13 20:42:03
Message-ID: 3d49cad1-43f3-4c8a-1676-fb552892e629@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/13/20 4:14 PM, Robert Haas wrote:
> On Mon, Apr 13, 2020 at 3:34 PM Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
>
>> Also, I
>> see no mention of prettification-chars such as newlines or indentation.
>> I suppose if I pass a manifest file through prettification (or Windows
>> newline conversion), the checksum may break.
>
> It would indeed break. I'm not sure what you want me to say here,
> though. If you're trying to parse a manifest, you shouldn't care about
> how the whitespace is arranged. If you're trying to generate one, you
> can arrange it any way you like, as long as you also include it in the
> checksum.

pgBackRest ignores whitespace but this is a legacy of the way Perl
calculated checksums, not an intentional feature. This worked well when
the manifest was loaded as a whole, converted to JSON, and checksummed,
but it is a major pain for the streaming code we now have in C.

I guarantee that that our next manifest version will do a simple
checksum of bytes as Robert has done in this feature.

So, I'm +1 as implemented.

>> Why is the top-level checksum only allowed to be SHA-256, if the files
>> can use up to SHA-512?

<snip>

> I agree that it's a little bit weird that you can have a stronger
> checksum for the files instead of the manifest itself, but I also
> wonder what the use case would be for using a stronger checksum on the
> manifest. David Steele argued that strong checksums on the files could
> be useful to software that wants to rifle through all the backups
> you've ever taken and find another copy of that file by looking for
> something with a matching checksum. CRC-32C wouldn't be strong enough
> for that, because eventually you could have enough files that you
> start to have collisions. The SHA algorithms output enough bits to
> make that quite unlikely. But this argument only makes sense for the
> files, not the manifest.

Agreed. I think SHA-256 is *more* than enough to protect the manifest
against corruption. That said, since the cost of SHA-256 vs. SHA-512 in
the context on the manifest is negligible we could just use the stronger
algorithm to deflect a similar question going forward.

That choice might not age well, but we could always say, well, we picked
it because it was the strongest available at the time. Allowing a choice
of which algorithm to use for to manifest checksum seems like it will
just make verifying the file harder with no tangible benefit.

Maybe just a comment in the docs about why SHA-256 was used would be fine.

>> (Also, did we intentionally omit the dash in
>> hash names, so "SHA-256" to make it SHA256? This will also be critical
>> for checksumming the manifest itself.)
>
> I debated this with myself, settled on this spelling, and nobody
> complained until now. It could be changed, though. I didn't have any
> particular reason for choosing it except the feeling that people would
> probably prefer to type --manifest-checksum=sha256 rather than
> --manifest-checksum=sha-256.

+1 for sha256 rather than sha-256.

Regards,
--
-David
david(at)pgmasters(dot)net

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-04-13 20:57:19 Re: Poll: are people okay with function/operator table redesign?
Previous Message Tom Lane 2020-04-13 20:41:51 Re: Poll: are people okay with function/operator table redesign?