Re: Online checksums verification in the backend

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: Online checksums verification in the backend
Date: 2020-09-07 08:50:38
Message-ID: 20200907085038.GA19261@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 07, 2020 at 09:38:30AM +0200, Julien Rouhaud wrote:
> Did you mean creating a new checksumfuncs.c file? I don't find any
> such file in the current tree.

Your patch adds checksumfuncs.c, so the subroutines grabbing a given
block could just be moved there.

> I'm not sure I understand. Unless I missed something this approach
> *cannot* raise a false positive. What it does is force a 2nd check
> with stronger lock *to make sure it's actually a corruption*, so we
> don't raise false positive. The only report that can happen in this
> 1st loop is if smgread raises an error, which AFAICT can only happen
> (at least with mdread) if the whole block couldn't be read, which is a
> sign of a very bad problem. This should clearly be reported, as this
> cannot be caused by the locking heuristics used here.

We don't know how much this optimization matters though? Could it be
possible to get an idea of that? For example, take the case of one
relation with a fixed size in a read-only workload and a read-write
workload (as long as autovacuum and updates make the number of
relation blocks rather constant for the read-write case), doing a
checksum verification in parallel of multiple clients working on the
relation concurrently. Assuming that the relation is fully in the OS
cache, we could get an idea of the impact with multiple
(shared_buffers / relation size) rates to make the eviction more
aggressive? The buffer partition locks, knowing that
NUM_BUFFER_PARTITIONS caps that, should be the bottleneck, still it
seems to me that it would be good to see if we have a difference.
What do you think?
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-09-07 08:59:07 Re: Transactions involving multiple postgres foreign servers, take 2
Previous Message Kyotaro Horiguchi 2020-09-07 08:40:36 Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation