Re: better page-level checksums

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Andrey Borodin <x4m(at)double(dot)cloud>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: better page-level checksums
Date: 2022-06-10 16:13:34
Message-ID: 20220610161333.GT9030@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Andrey Borodin (x4m(at)double(dot)cloud) wrote:
> On Fri, Jun 10, 2022 at 5:00 AM Matthias van de Meent <
> boekewurm+postgres(at)gmail(dot)com> wrote:
> > Can't we add some extra fork that stores this extra per-page
> > information, and contains this extra metadata
>
> +1 for this approach. I had observed some painful corruption cases where
> block storage simply returned stale version of a rage of blocks. This is
> only possible because checksum is stored on the page itself.
> A special fork for checksums would allow us to better detect failures in
> SSD firmawares, MMU SEUs etc, OS page cache, backup software and storage.
> It may seems that these kind of stuff never happen. But probability of such
> failure is drastically bigger than probability of hardware failure being
> undetected due to CRC16 collision.

This is another possible approach, sure, but it has its own downsides:
clearly more IO ends up being involved and then you also have to deal
with the fact that the fork's page would certainly end up covering a lot
of the pages in the main relation, not to mention the question of what
to do when we want to get checksums *on forks*, which we surely will
want to have...

> Also I'm skeptical about correcting detected errors with the information
> from checksum. This approach requires very very large checksum. It's much
> easier to obtain fresh block copy from HA standby.

Yeah, error correcting checksums are yet another use-case and one that
would require a lot more space.

Thanks,

Stephen

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2022-06-10 16:20:00 Re: better page-level checksums
Previous Message Robert Haas 2022-06-10 16:11:03 Re: Sharing DSA pointer between parallel workers after they've been created