From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Stephen Frost <sfrost(at)snowman(dot)net> |
Cc: | Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Online verification of checksums |
Date: | 2020-11-24 01:10:43 |
Message-ID: | 20201124011043.GA3046@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Nov 23, 2020 at 10:35:54AM -0500, Stephen Frost wrote:
> * Anastasia Lubennikova (a(dot)lubennikova(at)postgrespro(dot)ru) wrote:
>> It seems reasonable to me to rely on checksums only.
>>
>> As for retry, I think that API for concurrent I/O will be complicated.
>> Instead, we can introduce a function to read the page directly from shared
>> buffers after PAGE_RETRY_THRESHOLD attempts. It looks like a bullet-proof
>> solution to me. Do you see any possible problems with it?
It seems to me that you are missing the point here. It is not
necessary to read a page from shared buffers. What is necessary is to
make sure that there is zero concurrent I/O activity in shared buffers
while a page is getting checked on disk, giving the insurance that
there is zero risk of having a torn page for a check for anything
working with shared buffers. You could do that only on a retry if we
found a page where there was a checksum mismatch, meaning that the
page we either torn or currupted, but need an extra verification
anyway.
> We might end up reading pages back in that have been evicted, for one
> thing, which doesn't seem great, and this also seems likely to be
> awkward for cases which aren't using the replication protocol, unless
> every process maintains a connection to PG the entire time, which also
> doesn't seem great.
I don't quite see a problem in checking pages that have been just
evicted if we are able to detect faster that a page is corrupted,
because the initial check may fail because a page was torn, meaning
that it was in the middle of an eviction, but the page could also be
corrupted, meaning also that it was *not* torn, and would fail a retry
where we should make sure that there is no s_b concurrent activity.
So in the worst case of seeing you make the detection of a corrupted
page faster.
Please note that Andres also mentioned about the potential need to
worry about table AMs that call directly smgrwrite(), bypassing shared
buffers. The only cases in-core where it is used are related to init
forks when an unlogged relation gets created, where it would not
matter if you are doing a page check while holding a database
transaction as the newly-created relation would not be visible yet,
but it would matter in the case of base backups doing direct page
lookups. Fun.
> Also- what is the point of reading the page from shared buffers
> anyway..? All we need to do is prove that the page will be rewritten
> during WAL replay. If we can prove that, we don't actually care what
> the contents of the page are. We certainly can't calculate the
> checksum on a page we plucked out of shared buffers since we only
> calculate the checksum when we go to write the page out.
A LSN-based check makes the thing tricky. How do you make sure that
pd_lsn is not itself broken? It could be perfectly possible that a
random on-disk corruption makes pd_lsn seen as having a correct value,
still the rest of the page is borked.
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2020-11-24 01:28:06 | Re: Online verification of checksums |
Previous Message | Dagfinn Ilmari Mannsåker | 2020-11-24 00:52:59 | Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path |