Re: Online verification of checksums

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Online verification of checksums
Date: 2020-11-23 15:35:54
Message-ID: 20201123153553.GI16415@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Anastasia Lubennikova (a(dot)lubennikova(at)postgrespro(dot)ru) wrote:
> On 21.11.2020 04:30, Michael Paquier wrote:
> >The only method I can think as being really
> >reliable is based on two facts:
> >- Do a check only on pd_checksums, as that validates the full contents
> >of the page.
> >- When doing a retry, make sure that there is no concurrent I/O
> >activity in the shared buffers. This requires an API we don't have
> >yet.
>
> It seems reasonable to me to rely on checksums only.
>
> As for retry, I think that API for concurrent I/O will be complicated.
> Instead, we can introduce a function to read the page directly from shared
> buffers after PAGE_RETRY_THRESHOLD attempts. It looks like a bullet-proof
> solution to me. Do you see any possible problems with it?

We might end up reading pages back in that have been evicted, for one
thing, which doesn't seem great, and this also seems likely to be
awkward for cases which aren't using the replication protocol, unless
every process maintains a connection to PG the entire time, which also
doesn't seem great.

Also- what is the point of reading the page from shared buffers
anyway..? All we need to do is prove that the page will be rewritten
during WAL replay. If we can prove that, we don't actually care what
the contents of the page are. We certainly can't calculate the
checksum on a page we plucked out of shared buffers since we only
calculate the checksum when we go to write the page out.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2020-11-23 15:39:36 Re: cutting down the TODO list thread
Previous Message Luc Vlaming 2020-11-23 15:34:02 Re: Parallel plans and "union all" subquery