Re: pgsql: Validate page level checksums in base backups

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: Re: pgsql: Validate page level checksums in base backups
Date: 2018-04-03 18:48:08
Message-ID: CABUevEz0uBqg_uyfA-yFiL6Wo=kjn5kE9tEQrJ-vwWug9Vrwfw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Tue, Apr 3, 2018 at 8:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Magnus Hagander <magnus(at)hagander(dot)net> writes:
> > Yeah, there's clearly a second problem here.
>
> I think this test script is broken in many ways.
>
> It's scribbling on the source cluster's disk files and assuming that that
> translates one-for-one to what gets sent to the slave server --- but what
> if some of the blocks that it modifies on-disk are resident in the
> source's shared buffers? I think you'd have to shut down the source and
> then apply the corruption if you want stable results.
>

It doesn't actually use a slave server as part of the tests.

And basebackups don't read from the sources shared buffers, but it *does*
read from the kernel buffers.

I'd bet a good lunch that nondefault BLCKSZ would break it, as well,
> since the way in which the corruption is induced is just guessing
> as to where page boundaries are.
>

Yeah, that might be a problem. Those should be calculated from the block
size.

Also, scribbling on tables as sensitive as pg_class is just asking for
> trouble IMO. I don't see anything in this test, for example, that
> prevents autovacuum from running and causing a PANIC before the test
> can complete. Even with AV off, there's a good chance that clobber-
> cache-always animals will fall over because they do so many more
> physical accesses to the system catalogs. I'd suggest inducing the
> corruption in some user table(s) that we can more tightly constrain
> the source server's accesses to.
>

Yeah, that seems like a good idea. And probably also shut the server down
while writing the corruption, just in case.

Will stick looking into that on my todo for when I'm back, unless beaten to
it. Michael, you want a stab at it?

//Magnus

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2018-04-03 18:55:07 Re: pgsql: Validate page level checksums in base backups
Previous Message Tom Lane 2018-04-03 18:47:48 pgsql: Suppress compiler warning in new jsonb_plperl code.

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-04-03 18:55:07 Re: pgsql: Validate page level checksums in base backups
Previous Message Peter Geoghegan 2018-04-03 18:45:03 Re: pgsql: Validate page level checksums in base backups