From: | Michael Banck <michael(dot)banck(at)credativ(dot)de> |
---|---|
To: | David Steele <david(at)pgmasters(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Cc: | Stephen Frost <sfrost(at)snowman(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Re: [PATCH] Verify Checksums during Basebackups |
Date: | 2018-03-23 09:36:27 |
Message-ID: | 1521797787.15036.23.camel@credativ.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi David,
thanks for the review!
Am Donnerstag, den 22.03.2018, 12:22 -0400 schrieb David Steele:
> On 3/17/18 5:34 PM, Michael Banck wrote:
> > On Fri, Mar 09, 2018 at 10:35:33PM +0100, Michael Banck wrote:
> > I think most people (including those I had off-list discussions about
> > this with) were of the opinion that such an option should be there, so I
> > added an additional option NOVERIFY_CHECKSUMS to the BASE_BACKUP
> > replication command and also an option -k / --no-verify-checksums to
> > pg_basebackup to trigger this.
> >
> > Updated patch attached.
>
> + memcpy(page, (buf + BLCKSZ * i), BLCKSZ);
>
> Why make a copy here? How about:
>
> char *page = buf + BLCKSZ * i
Right, ok.
> I know pg_checksum_page manipulates the checksum field but I have found
> it to be safe.
>
> + if (phdr->pd_checksum != checksum)
>
> I've attached a patch that adds basic retry functionality. It's not
> terrible efficient since it rereads the entire buffer for any block
> error. A better way is to keep a bitmap for each block in the buffer,
> then on retry compare bitmaps. If the block is still bad, report it.
> If the block was corrected moved on. If a block was good before but is
> bad on retry it can be ignored.
I have to admit I find it a bit convoluted and non-obvious on first
reading, but I'll try to check it out some more.
I think it would be very useful if we could come up with a testcase
showing this problem, but I guess this will be quite hard to hit
reproducibly, right?
> + ereport(WARNING,
> + (errmsg("checksum verification failed in file "
>
> I'm worried about how verbose this warning could be since there are
> 131,072 blocks per segment. It's unlikely to have that many block
> errors, but users do sometimes put files in PGDATA which look like they
> should be validated. Since these warnings all go to the server log it
> could get pretty bad.
We only verify checksums of files in tablespaces, and I don't think
dropping random files in those is supported in any way, as opposed to
maybe the top-level PGDATA directory. So I would say that this is not a
real concern.
> We should either stop warning after the first failure, or aggregate the
> failures for a file into a single message.
I agree that major corruption could make the whole output blow up but I
would prefer to keep this feature simple for now, which implies possibly
printing out a lot of WARNING or maybe just stopping after the first
one (or first few, dunno).
Michael
--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de
credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
From | Date | Subject | |
---|---|---|---|
Next Message | Pavan Deolasee | 2018-03-23 09:57:15 | Re: Faster inserts with mostly-monotonically increasing values |
Previous Message | Ashutosh Bapat | 2018-03-23 09:30:37 | Comment update in BuildTupleFromCStrings() |