From: | Stephen Frost <sfrost(at)snowman(dot)net> |
---|---|
To: | Ashwin Agrawal <aagrawal(at)pivotal(dot)io> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: pg_basebackup misses to report checksum error |
Date: | 2020-05-07 17:24:59 |
Message-ID: | 20200507172458.GI13712@tamriel.snowman.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Greetings,
* Ashwin Agrawal (aagrawal(at)pivotal(dot)io) wrote:
> On Wed, May 6, 2020 at 3:02 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > On Wed, May 6, 2020 at 5:48 PM Ashwin Agrawal <aagrawal(at)pivotal(dot)io> wrote:
> > > If pg_basebackup is not able to read BLCKSZ content from file, then it
> > > just emits a warning "could not verify checksum in file "____" block
> > > X: read buffer size X and page size 8192 differ" currently but misses
> > > to error with "checksum error occurred". Only if it can read 8192 and
> > > checksum mismatch happens will it error in the end.
> >
> > I don't think it's a good idea to conflate "hey, we can't checksum
> > this because the size is strange" with "hey, the checksum didn't
> > match". Suppose the a file has 1000 full blocks and a partial block.
> > All 1000 blocks have good checksums. With your change, ISTM that we'd
> > first emit a warning saying that the checksum couldn't be verified,
> > and then we'd emit a second warning saying that there was 1 checksum
> > verification failure, which would also be reported to the stats
> > system. I don't think that's what we want.
>
> I feel the intent of reporting "total checksum verification failure" is to
> report corruption. Which way is the secondary piece of the puzzle. Not
> being able to read checksum itself to verify is also corruption and is
> checksum verification failure I think. WARNINGs will provide fine grained
> clarity on what type of checksum verification failure it is, so I am not
> sure we really need fine grained clarity in "total numbers" to
> differentiate these two types.
Are we absolutely sure that there's no way for a partial block to end up
being seen by pg_basebackup, which is just doing routine filesystem
read() calls, during normal operation though..? Across all platforms?
We certainly don't want to end up reporting a false positive by saying
that there's been corruption when it was just that the file was getting
extended and a read() happened to catch an incomplete write(), or
something along those lines.
Thanks,
Stephen
From | Date | Subject | |
---|---|---|---|
Next Message | Chapman Flack | 2020-05-07 18:08:58 | Re: PG 13 release notes, first draft |
Previous Message | Ashwin Agrawal | 2020-05-07 17:15:45 | Re: pg_basebackup misses to report checksum error |