Quick Links

Re: Online checksums verification in the backend

From:	Julien Rouhaud <rjuju123(at)gmail(dot)com>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject:	Re: Online checksums verification in the backend
Date:	2020-03-18 10:10:55
Message-ID:	20200318101055.GA36918@nol
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Mar 18, 2020 at 07:06:19AM +0100, Julien Rouhaud wrote:
> On Wed, Mar 18, 2020 at 01:20:47PM +0900, Michael Paquier wrote:
> > On Mon, Mar 16, 2020 at 09:21:22AM +0100, Julien Rouhaud wrote:
> > > On Mon, Mar 16, 2020 at 12:29:28PM +0900, Michael Paquier wrote:
> > >> With a large amount of
> > >> shared buffer eviction you actually increase the risk of torn page
> > >> reads. Instead of a logic relying on partition mapping locks, which
> > >> could be unwise on performance grounds, did you consider different
> > >> approaches? For example a kind of pre-emptive lock on the page in
> > >> storage to prevent any shared buffer operation to happen while the
> > >> block is read from storage, that would act like a barrier.
> > >
> > > Even with a workload having a large shared_buffers eviction pattern, I don't
> > > think that there's a high probability of hitting a torn page. Unless I'm
> > > mistaken it can only happen if all those steps happen concurrently to doing the
> > > block read just after releasing the LWLock:
> > >
> > > - postgres read the same block in shared_buffers (including all the locking)
> > > - dirties it
> > > - writes part of the page
> > >
> > > It's certainly possible, but it seems so unlikely that the optimistic lock-less
> > > approach seems like a very good tradeoff.
> >
> > Having false reports in this area could be very confusing for the
> > user. That's for example possible now with checksum verification and
> > base backups.
>
>
> I agree, however this shouldn't be the case here, as the block will be
> rechecked while holding proper lock the 2nd time in case of possible false
> positive before being reported as corrupted. So the only downside is to check
> twice a corrupted block that's not found in shared buffers (or concurrently
> loaded/modified/half flushed). As the number of corrupted or concurrently
> loaded/modified/half flushed blocks should usually be close to zero, it seems
> worthwhile to have a lockless check first for performance reason.

I just noticed some dumb mistakes while adding the new GUCs. v5 attached to
fix that, no other changes.

Attachment	Content-Type	Size
v5-0001-Add-a-pg_check_relation-function.patch	text/plain	38.0 KB

In response to

Re: Online checksums verification in the backend at 2020-03-18 06:06:19 from Julien Rouhaud

Responses

Re: Online checksums verification in the backend at 2020-03-28 03:28:27 from Masahiko Sawada

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2020-03-18 11:16:53	Re: adding partitioned tables to publications
Previous Message	Fujii Masao	2020-03-18 09:59:51	Re: RecoveryWalAll and RecoveryWalStream wait events