From: | Michael Banck <michael(dot)banck(at)credativ(dot)de> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de>, Stephen Frost <sfrost(at)snowman(dot)net> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Subject: | Re: Online verification of checksums |
Date: | 2019-03-19 21:39:16 |
Message-ID: | 1553031556.9697.59.camel@credativ.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
Am Dienstag, den 19.03.2019, 13:00 -0700 schrieb Andres Freund:
> On 2019-03-20 03:27:55 +0800, Stephen Frost wrote:
> > On Tue, Mar 19, 2019 at 23:59 Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > On 2019-03-19 16:52:08 +0100, Michael Banck wrote:
> > > > Am Dienstag, den 19.03.2019, 11:22 -0400 schrieb Robert Haas:
> > > > > It's torn pages that I am concerned about - the server is writing and
> > > > > we are reading, and we get a mix of old and new content. We have been
> > > > > quite diligent about protecting ourselves from such risks elsewhere,
> > > > > and checksum verification should not be held to any lesser standard.
> > > >
> > > > If we see a checksum failure on an otherwise correctly read block in
> > > > online mode, we retry the block on the theory that we might have read a
> > > > torn page. If the checksum verification still fails, we compare its LSN
> > > > to the LSN of the current checkpoint and don't mind if its newer. This
> > > > way, a torn page should not cause a false positive either way I
> > > > think?.
> > >
> > > False positives, no. But there's plenty potential for false
> > > negatives. In plenty clusters a large fraction of the pages is going to
> > > be touched in most checkpoints.
> >
> >
> > How is it a false negative? The page was in the middle of being
> > written,
>
> You don't actually know that. It could just be random gunk in the LSN,
> and this type of logic just ignores such failures as long as the random
> gunk is above the system's LSN.
Right, I think this needs to be taken into account. For pg_basebackup,
that'd be an additional check for GetRedoRecPtr() or something
in the below check:
[...]
> Well, I don't know what to tell you. But:
>
> /*
> * Only check pages which have not been modified since the
> * start of the base backup. Otherwise, they might have been
> * written only halfway and the checksum would not be valid.
> * However, replaying WAL would reinstate the correct page in
> * this case. We also skip completely new pages, since they
> * don't have a checksum yet.
> */
> if (!PageIsNew(page) && PageGetLSN(page) < startptr)
> {
>
> doesn't consider plenty scenarios, as pointed out above. It'd be one
> thing if the concerns I point out above were actually commented upon and
> weighed not substantial enough (not that I know how). But...
>
> > Do you have any example cases where the code in pg_basebackup has resulted
> > in either a false positive or a false negative? Any case which can be
> > shown to result in either?
>
> CREATE TABLE corruptme AS SELECT g.i::text AS data FROM generate_series(1, 1000000) g(i);
> SELECT pg_relation_size('corruptme');
> postgres[22890][1]=# SELECT current_setting('data_directory') || '/' || pg_relation_filepath('corruptme');
> ┌─────────────────────────────────────┐
> │ ?column? │
> ├─────────────────────────────────────┤
> │ /srv/dev/pgdev-dev/base/13390/16384 │
> └─────────────────────────────────────┘
> (1 row)
> dd if=/dev/urandom of=/srv/dev/pgdev-dev/base/13390/16384 bs=8192 count=1 conv=notrunc
>
> Try a basebackup and see how many times it'll detect the corrupt
> data. In the vast majority of cases you're going to see checksum
> failures when reading the data for normal operation, but not when using
> basebackup (or this new tool).
Right, see above.
> At the very very least this would need to do
>
> a) checks that the page is all zeroes if PageIsNew() (like
> PageIsVerified() does for the backend). That avoids missing cases
> where corruption just zeroed out the header, but not the whole page.
We can't run pg_checksum_page() on those afterwards though as it would
fire an assertion:
|pg_checksums: [...]/../src/include/storage/checksum_impl.h:194:
|pg_checksum_page: Assertion `!(((PageHeader) (&cpage->phdr))->pd_upper
|== 0)' failed.
But we should count it as a checksum error and generate an appropriate
error message in that case.
> b) Check that pd_lsn is between startlsn and the insertion pointer. That
> avoids accepting just about all random data.
However, for pg_checksums being a stand-alone application it can't just
access the insertion pointer, can it? We could maybe set a threshold
from the last checkpoint after which we consider the pd_lsn bogus. But
what's a good threshold here?
And/or we could port the other sanity checks from PageIsVerified:
| if ((p->pd_flags & ~PD_VALID_FLAG_BITS) == 0 &&
| p->pd_lower <= p->pd_upper &&
| p->pd_upper <= p->pd_special &&
| p->pd_special <= BLCKSZ &&
| p->pd_special == MAXALIGN(p->pd_special))
| header_sane = true
That should catch large-scale random corruption like you showed above.
Michael
--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de
credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2019-03-19 21:44:52 | Re: Online verification of checksums |
Previous Message | Robert Haas | 2019-03-19 21:34:37 | Re: Online verification of checksums |