From: | Michael Banck <michael(dot)banck(at)credativ(dot)de> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Online verification of checksums |
Date: | 2020-10-21 10:00:23 |
Message-ID: | 79ffc294aa51d33bf3d7569a6e72977fb2051925.camel@credativ.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
Am Dienstag, den 20.10.2020, 18:11 +0900 schrieb Michael Paquier:
> On Mon, Apr 06, 2020 at 04:45:44PM -0400, Tom Lane wrote:
> > Actually, after thinking about that a bit more: why is there an LSN-based
> > special condition at all? It seems like it'd be far more useful to
> > checksum everything, and on failure try to re-read and re-verify the page
> > once or twice, so as to handle the corner case where we examine a page
> > that's in process of being overwritten.
>
> I was reviewing this area today, and that actually matches my
> impression. Why do we need a LSN-based check at all? As said
> upthread, that's of course weak with random data as we would miss most
> of the real checksum failures, with odds getting better depending on
> the current LSN of the cluster moving on. However, it seems to me
> that we would have an extra advantage in removing this check
> all together: it would be possible to check for pages even if these
> are more recent than the start LSN of the backup, and that could be a
> lot of pages that could be checked on a large cluster. So by keeping
> this check we also delay the detection of real problems.
The check was ported (or the concept of it adapted) from pgBackRest if I
remember correctly.
> As things stand, I'd like to think that it would be much more useful
> to remove this check and to have one or two extra retries (the current
> code only has one). I don't like much the possibility of false
> positives for such critical checks, but as we need to live with what
> has been released, that looks like a good move for stable branches.
Sounds good to me. I think some were advocating for locking the page
before re-reading. When I looked at it, the level of abstraction that
pg_basebackup has (just a list of files chopped up into blocks, no
notion of relations I think) made that non-trivial, but maybe still
possible for v14 and beyond.
Michael
--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de
credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2020-10-21 10:08:34 | Re: Transactions involving multiple postgres foreign servers, take 2 |
Previous Message | Bharath Rupireddy | 2020-10-21 09:48:56 | Re: Parallel copy |