Re: basebackup checksum verification

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, David Steele <david(at)pgmasters(dot)net>
Subject: Re: basebackup checksum verification
Date: 2019-03-27 00:51:31
Message-ID: 20190327005131.GF6197@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Tomas Vondra (tomas(dot)vondra(at)2ndquadrant(dot)com) wrote:
> On Tue, Mar 26, 2019 at 08:18:31PM -0400, Stephen Frost wrote:
> >* Tomas Vondra (tomas(dot)vondra(at)2ndquadrant(dot)com) wrote:
> >>On Tue, Mar 26, 2019 at 04:49:21PM -0700, Andres Freund wrote:
> >>>On 2019-03-26 19:22:03 -0400, Stephen Frost wrote:
> >>>>* Andres Freund (andres(at)anarazel(dot)de) wrote:
> >>>>> As detailed in
> >>>>> https://postgr.es/m/20190319200050.ncuxejradurjakdc%40alap3.anarazel.de
> >>>>> the way the backend's basebackup checksum verification works makes its
> >>>>> error detection capabilities very dubious.
> >>>>
> >>>>I disagree that it's 'very dubious', even with your analysis.
> >>>
> >>>I really don't know what to say. The current algorithm is flat out
> >>>bogus.
> >>
> >>Bogus might be a bit too harsh, but yeah - failure to reliably detect obviously
> >>invalid checksums when the LSN just happens to be high due to randomness is not
> >>a good thing. We'll still detect pages corrupted in other places, but this is
> >>rather unfortunate.
> >
> >I'm all for improving it, as I said originally.
> >
> >>>>I thought Robert's response was generally good, pointing out that
> >>>>we're talking about this being an issue if the corruption happens in a
> >>>>certain set of bytes. That said, I'm happy to see improvements in
> >>>>this area but I'm flat out upset about the notion that we must be
> >>>>perfect here- our checksums themselves aren't perfect for catching
> >>>>corruption either.
> >>>
> >>>The point is that we're not detecting errors that we can detect when
> >>>read outside of basebackup. I really entirely completely fail how that
> >>>can be defended.
> >>>
> >>>I think we're making promises with this the basebackup feature we're not
> >>>even remotely keeping. I don't understand how you can defend that, given
> >>>the current state, you can have a basebackup that you took with
> >>>checksums enabled, and then when actually use that basebackup you get
> >>>checksum failures. Like it's one thing not to detect all storage
> >>>issues, but if we do detect them after using the basebackup, that's
> >>>really not ok.
> >>
> >>Yeah, if basebackup completes without reporting any invalid checksums, but
> >>running pg_verify_checksums on the same backups detects those, that probably
> >>should raise some eyebrows.
> >
> >That isn't actually what would happen at this point, just so we're
> >clear. What Andres is talking about is a solution which would only
> >actually work for pg_basebackup, and not for pg_verify_checksums
> >(without some serious changes which make it connect to the running
> >server and run various functions to perform the locking that he's
> >proposing pg_basebackup do...).
>
> I was talking about pg_verify_checksums in offline mode, i.e. when you
> take a backup and then run pg_verify_checksums on it. I'm pretty sure
> that does not need to talk to the cluster. Sorry if that was not clear.

To make that work, you'd have to take a backup, then restore it, then
bring PG up and have it replay all of the outstanding WAL, then shut
down PG cleanly, and *then* you could run pg_verify_checksums on it.

> >>We already have such blind spot, but it's expected to be pretty small
> >>(essentially pages modified since start of the backup).
> >
> >eh..? This is.. more-or-less entirely what's being discussed here:
> >exactly how we detect and determine which pages were modified since the
> >start of the backup, and which might have been partially written out
> >when we tried to read them and therefore fail a checksum check, but it
> >doesn't matter because we don't actually end up using those pages.
>
> And? All I'm saying is that we knew there's a gap that we don't check,
> but that the understanding was that it's rather small and limited to
> recently modified pages. If we can further reduce that window, that's
> great and we should do it, of course.

The point that Andres is making is that in the face of corruption of
certain particular bytes, the set of pages that we check can be much
less than the overall size of the DB (or that of what was recently
modified), and we end up skipping a lot of other pages due to the
corruption rather than because they've been recently modified because
the determination of what's been recently modified is based on those
bytes. With the changes being made to pg_checksums, we'd at least
report back those pages as having been 'skipped', but better would be if
we could accurately determine that the pages were recently modified and
therefore what we saw was a torn page.

Thanks,

Stephen

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Inoue, Hiroshi 2019-03-27 01:01:08 Re: setLastTid() and currtid()
Previous Message Haribabu Kommi 2019-03-27 00:42:02 Re: MSVC Build support with visual studio 2019