Re: better page-level checksums

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: better page-level checksums
Date: 2022-06-10 18:36:19
Message-ID: CA+TgmoZ+DcjmvCm6_Su0ECYui5ZdJz0wHFp2QMJ9ORjE3vkxsQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 10, 2022 at 12:08 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> So, it's not quite as simple as use X or use Y, we need to be
> considering the use case too. In particular, the amount of data that's
> being hash'd is relevant when it comes to making a decision about what
> hash or checksum to use. When you're talking about (potentially) 1G
> segment files, you'll want to use something different (like SHA) vs.
> when you're talking about an 8K block (not that you couldn't use SHA,
> but it may very well be overkill for it).

Interesting. I expected you to be cheerleading for SHA like a madman.

> In terms of TDE, that's yet a different use-case and you'd want to use
> AE (authenticated encryption) + AAD (additional authenticated data) and
> the result of that operation is a block which has some amount of
> unencrypted data (eg: LSN, potentially used as the IV), some amount of
> encrypted data (eg: everything else), and then space to store the tag
> (which can be thought of, but is *distinct* from, a hash of the
> encrypted data + the additional unencrypted data, where the latter would
> include the unencrypted data on the block, like the LSN, plus other
> information that we want to include like the qualified path+filename of
> the file as relevant to the PGDATA root). If our goal is
> cryptographiclly authenticated and encrypted data pages (which I believe
> is at least one of our goals) then we're talking about encryption
> methods like AES GCM which handle production of the tag for us and with
> that tag we would *not* need to have any independent hash or checksum for
> the block (though we could, but that should really be included in the
> *encrypted* section, as hashing unencrypted data and then storing that
> hash unencrypted could potentially leak information that we'd rather
> not).

Yeah, and I feel there was discussion of how much space AES-GCM-SIV
would need per page and I can't find that discussion now. Pretty sure
it was a pretty meaty number of bytes, and I assume it's also not that
cheap to compute.

> Note that NIST has put out information regarding how big a tag is
> appropriate for how much data is being encrypted with a given
> authenticated encryption method such as AES GCM. I recall Robert
> finding similar information for hashing/checksumming of unencrypted
> data from a similar source and that'd make sense to consider when
> talking about *just* adding a hash/checksum for unencrypted data blocks.
>
> This is the relevant discussion from NIST on this subject:
>
> https://nvlpubs.nist.gov/nistpubs/legacy/sp/nistspecialpublication800-38d.pdf
>
> Note particularly Appendix C: Requirements and Guidelines for Using
> Short Tags (though, really, the whole thing is good to read..).

I don't see that as very relevant. That's talking about using 32-bit
or 64-bit tags for things like VOIP packets where a single compromised
packet wouldn't reveal a whole lot. I think we have to take the view
that a single compromised disk block is a big deal.

> In the thread about checksum/hashes for the backup manifest, I was
> pretty sure you found some information regarding the amount of data
> being hashed vs. the size you want the hash/checksum to be and that
> seems like it'd be particularly relevant for this discussion (as it was
> for backups, at least as I recall..). Hopefully we can go find that.

I went back and looked and found that I had written this:

https://www.postgresql.org/message-id/CA+TgmoYOKC_8o-AR1jTQs0mOrFx=_Rcy5udor1m-LjyJNiSWPQ@mail.gmail.com

I think that gets us a whole lot of nowhere, honestly. I think this
email from Andres is more on point:

http://postgr.es/m/20200327195624.xthhd4xuwabvd3ou@alap3.anarazel.de

I don't really understand all the details of the smhasher pages to
which he links, but I understand that they're measuring the quality of
the bit-mixing, which does matter. So does speed, because people like
their database to be fast even if it's using checksums (or TDE if we
had that). And I think the output size is also a relevant
consideration, because more output bits means both more chance of
detecting errors (assuming good bit-mixing, at least) and also more
wasted space that isn't being used to store your actual data.

I haven't really seen anything that makes me believe that there's a
particularly strong relationship between block size and ideal checksum
size. There's some relationship, for sure. For instance, you want the
checksum to be small relative to the size of the input, so as not to
waste a whole bunch of storage space. I wouldn't propose to hash 32
byte blocks with SHA-256, because my checksums would be as big as the
original data. But I don't really think such considerations are
relevant here. An 8kB block is big enough that any checksum algorithm
in common use today is going to produce output that is well under 1%
of the page size, so you're not going to be wasting tons of storage.

You might be wasting your time, though. One big knock on the
Fletcher-16 approach we're using today is that the chance of a chance
hash collision is pretty noticeably more than 0. Generally, I think
we're right to think that's acceptable, because your chances of
noticing even a single corrupted block are very high. However, if
you're operating tens or hundreds of thousands of PostgreSQL clusters
containing terabytes or petabytes of data, it's quite likely that
there will be instances of corruption which you fail to detect because
the checksum collided. Maybe you care about that. If you do, you
probably need at least a 64-bit checksum before the risk of missing
even a single instance of corruption due to a checksum collision
becomes negligible. Maybe even slightly larger if the amount of data
you're managing is truly vast.

So I think there's probably a good argument that if you're just
concerned about detecting corruption due to bugs, operator error,
hardware failure, etc., something like a 512-bit checksum is overkill
if the only purpose is to detect random bit flips. I think the time
when you need more bits is when you have some goal beyond being really
likely to detect a random error - e.g. if you want 100% guaranteed
detection of every single-bit error, or if you want error correction,
or if you want to foil an adversary who is trying to construct
checksums for maliciously modified blocks. But that is also true for
the backup manifest case, and we allowed SHA-n as an option there. I
feel like there are bound to be some people who want something like
SHA-n just because it sounds good, regardless of whether they really
need it.

We can tell them "no," though.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-06-10 18:53:22 Re: Removing "plpythonu" in PG 15
Previous Message Bruce Momjian 2022-06-10 18:34:13 Removing "plpythonu" in PG 15