From: | David Steele <david(at)pgmasters(dot)net> |
---|---|
To: | Michael Banck <michael(dot)banck(at)credativ(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Cc: | Stephen Frost <sfrost(at)snowman(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Re: [PATCH] Verify Checksums during Basebackups |
Date: | 2018-03-23 14:54:08 |
Message-ID: | 1a230b05-6cfc-2557-20cc-09e7a176230c@pgmasters.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Michael,
On 3/23/18 5:36 AM, Michael Banck wrote:
> Am Donnerstag, den 22.03.2018, 12:22 -0400 schrieb David Steele:
>>
>> + if (phdr->pd_checksum != checksum)
>>
>> I've attached a patch that adds basic retry functionality. It's not
>> terrible efficient since it rereads the entire buffer for any block
>> error. A better way is to keep a bitmap for each block in the buffer,
>> then on retry compare bitmaps. If the block is still bad, report it.
>> If the block was corrected moved on. If a block was good before but is
>> bad on retry it can be ignored.
>
> I have to admit I find it a bit convoluted and non-obvious on first
> reading, but I'll try to check it out some more.
Yeah, I think I was influenced too much by how pgBackRest does things,
which doesn't work as well here. Attached is a simpler version.
> I think it would be very useful if we could come up with a testcase
> showing this problem, but I guess this will be quite hard to hit
> reproducibly, right?
This was brought up by Robert in [1] when discussing validating
checksums during backup. I don't know of any way to reproduce this
issue but it seems perfectly possible, if highly unlikely.
>> + ereport(WARNING,
>> + (errmsg("checksum verification failed in file "
>>
>> I'm worried about how verbose this warning could be since there are
>> 131,072 blocks per segment. It's unlikely to have that many block
>> errors, but users do sometimes put files in PGDATA which look like they
>> should be validated. Since these warnings all go to the server log it
>> could get pretty bad.
>
> We only verify checksums of files in tablespaces, and I don't think
> dropping random files in those is supported in any way, as opposed to
> maybe the top-level PGDATA directory. So I would say that this is not a
> real concern.
Perhaps, but a very corrupt file is still going to spew lots of warnings
into the server log.
>> We should either stop warning after the first failure, or aggregate the
>> failures for a file into a single message.
>
> I agree that major corruption could make the whole output blow up but I
> would prefer to keep this feature simple for now, which implies possibly
> printing out a lot of WARNING or maybe just stopping after the first
> one (or first few, dunno).
In my experience actual block errors are relatively rare, so there
aren't likely to be more than a few in a file. More common are
overwritten or transposed files, rogue files, etc. These produce a lot
of output.
Maybe stop after five?
Regards,
--
-David
david(at)pgmasters(dot)net
Attachment | Content-Type | Size |
---|---|---|
reread-v2.patch | text/plain | 2.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2018-03-23 14:59:50 | Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit) |
Previous Message | Haribabu Kommi | 2018-03-23 14:49:28 | Re: PQHost() undefined behavior if connecting string contains both host and hostaddr types |