Quick Links

Re: Online verification of checksums

From:	Julien Rouhaud <rjuju123(at)gmail(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Online verification of checksums
Date:	2019-03-08 17:59:41
Message-ID:	CAOBaU_ZaNXN1TMToZG4q4gQZeS7KVUx2_Hsa4MpAwhbu1N-WXA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Mar 8, 2019 at 6:50 PM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> On 3/8/19 4:19 PM, Julien Rouhaud wrote:
> > On Thu, Mar 7, 2019 at 7:00 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >>
> >> On 2019-03-07 12:53:30 +0100, Tomas Vondra wrote:
> >>>
> >>> But then again, we could just
> >>> hack a special version of ReadBuffer_common() which would just
> >>
> >>> (a) check if a page is in shared buffers, and if it is then consider the
> >>> checksum correct (because in memory it may be stale, and it was read
> >>> successfully so it was OK at that moment)
> >>>
> >>> (b) if it's not in shared buffers already, try reading it and verify the
> >>> checksum, and then just evict it right away (not to spoil sb)
> >>
> >> This'd also make sense and make the whole process more efficient. OTOH,
> >> it might actually be worthwhile to check the on-disk page even if
> >> there's in-memory state. Unless IO is in progress the on-disk page
> >> always should be valid.
> >
> > Definitely. I already saw servers with all-frozen-read-only blocks
> > popular enough to never get evicted in months, and then a minor
> > upgrade / restart having catastrophic consequences.
> >
>
> Do I understand correctly the "catastrophic consequences" here are due
> to data corruption / broken checksums on those on-disk pages?

Ah, yes sorry I should have been clearer. Indeed, there was silent
data corruptions (no ckecksum though) that was revealed by the
restart. So a routine minor update resulted in a massive outage.
Such a scenario can't be avoided if we always bypass checksum check
for alreay in shared_buffers pages.

In response to

Re: Online verification of checksums at 2019-03-08 17:49:56 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tomas Vondra	2019-03-08 18:08:29	Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Previous Message	Tomas Vondra	2019-03-08 17:49:56	Re: Online verification of checksums