Re: Enable data checksums by default

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christoph Berg <myon(at)debian(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Enable data checksums by default
Date: 2019-03-22 17:07:15
Message-ID: 20190322170715.tafjitaldyhwfl2u@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2019-03-22 18:01:32 +0100, Tomas Vondra wrote:
> On 3/22/19 5:41 PM, Andres Freund wrote:
> > Hi,
> >
> > On 2019-03-22 17:32:10 +0100, Tomas Vondra wrote:
> >> On 3/22/19 5:10 PM, Andres Freund wrote:
> >>> IDK, being able to verify in some form that backups aren't corrupted on
> >>> an IO level is mighty nice. That often does allow to detect the issue
> >>> while one still has older backups around.
> >>>
> >>
> >> Yeah, I agree that's a valuable capability. I think the question is how
> >> effective it actually is considering how much the storage changed over
> >> the past few years (which necessarily affects the type of failures
> >> people have to deal with).
> >
> > I'm not sure I understand? How do the changes around storage
> > meaningfully affect the need to have some trust in backups and
> > benefiting from earlier detection?
> >
>
> Having trusted in backups is still desirable - nothing changes that,
> obviously. The question I was posing was rather "Are checksums still
> effective on current storage systems?"
>
> I'm wondering if the storage systems people use nowadays may be failing
> in ways that are not reliably detectable by checksums. I don't have any
> data to either support or reject that hypothesis, though.

I don't think it's useful to paint unsubstantiated doom-and-gloom
pictures.

> >> It's not clear to me what can checksums do about zeroed pages (and/or
> >> truncated files) though.
> >
> > Well, there's nothing fundamental about needing added pages be
> > zeroes. We could expand them to be initialized with actual valid
> > checksums instead of
> > /* new buffers are zero-filled */
> > MemSet((char *) bufBlock, 0, BLCKSZ);
> > /* don't set checksum for all-zero page */
> > smgrextend(smgr, forkNum, blockNum, (char *) bufBlock, false);
> >
> > the problem is that it's hard to do so safely without adding a lot of
> > additional WAL logging. A lot of filesystems will journal metadata
> > changes (like the size of the file), but not contents. So after a crash
> > the tail end might appear zeroed out, even if we never wrote
> > zeroes. That's obviously solvable by WAL logging, but that's not cheap.
> >
>
> Hmmm. I'd say a filesystem that does not guarantee having all the data
> after an fsync is outright broken, but maybe that's what checksums are
> meant to protect against.

There's no fsync here. smgrextend(with-valid-checksum);crash; - the OS
will probably have journalled the file size change, but not the
contents. After a crash it's thus likely that the data page will appear
zeroed. Which prevents us from erroring out when encountering a zeroed
page, even though that'd be very good for error detection capabilities,
because storage systems will show corrupted data as zeroes in a number
of cases.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2019-03-22 17:25:31 Error message inconsistency
Previous Message Tomas Vondra 2019-03-22 17:01:32 Re: Enable data checksums by default