From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers(at)postgresql(dot)org, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, David Steele <david(at)pgmasters(dot)net> |
Subject: | Re: basebackup checksum verification |
Date: | 2019-03-27 00:10:42 |
Message-ID: | 20190327000957.GA4406@development |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Mar 26, 2019 at 04:49:21PM -0700, Andres Freund wrote:
>Hi,
>
>On 2019-03-26 19:22:03 -0400, Stephen Frost wrote:
>> Greetings,
>>
>> * Andres Freund (andres(at)anarazel(dot)de) wrote:
>> > As detailed in
>> > https://postgr.es/m/20190319200050.ncuxejradurjakdc%40alap3.anarazel.de
>> > the way the backend's basebackup checksum verification works makes its
>> > error detection capabilities very dubious.
>>
>> I disagree that it's 'very dubious', even with your analysis.
>
>I really don't know what to say. The current algorithm is flat out
>bogus.
>
Bogus might be a bit too harsh, but yeah - failure to reliably detect obviously
invalid checksums when the LSN just happens to be high due to randomness is not
a good thing. We'll still detect pages corrupted in other places, but this is
rather unfortunate.
>
>> I thought Robert's response was generally good, pointing out that
>> we're talking about this being an issue if the corruption happens in a
>> certain set of bytes. That said, I'm happy to see improvements in
>> this area but I'm flat out upset about the notion that we must be
>> perfect here- our checksums themselves aren't perfect for catching
>> corruption either.
>
>The point is that we're not detecting errors that we can detect when
>read outside of basebackup. I really entirely completely fail how that
>can be defended.
>
>I think we're making promises with this the basebackup feature we're not
>even remotely keeping. I don't understand how you can defend that, given
>the current state, you can have a basebackup that you took with
>checksums enabled, and then when actually use that basebackup you get
>checksum failures. Like it's one thing not to detect all storage
>issues, but if we do detect them after using the basebackup, that's
>really not ok.
>
Yeah, if basebackup completes without reporting any invalid checksums, but
running pg_verify_checksums on the same backups detects those, that probably
should raise some eyebrows.
We already have such blind spot, but it's expected to be pretty small
(essentially pages modified since start of the backup).
>
>> > 2) If 1) errored out, ensure that that's because the backend is
>> > currently writing out the page. That basically requires doing what
>> > BufferAlloc() does. So I think we'd need to end up with a function
>> > like:
>> >
>> > LockoutBackendWrites():
>> > buf = ReadBufferWithoutRelcache(relfilenode);
>>
>> This is going to cause it to be pulled into shared buffers, if it isn't
>> already there, isn't it?
>
>I can't see that being a problem. We're only going to enter this path if
>we encountered a buffer where the checksum was wrong. And either that's
>a data corruption even in which case we don't care about a small
>performance penalty, or it's a race around writing out the page because
>basebackup read it while half writen - in which case it's pretty pretty
>likely that the page is still in shared buffers.
>
Yep, I think this is fine. Although, I think in the other thread where this
failure mode was discussed, I think we've only discussed to get I/O lock on
the buffer, no? But as you say, this should be a rare code path.
>
>> That seems less than ideal and isn't it going
>> to end up just doing exactly the same PageIsVerified() call, and what
>> happens when that fails?
>
>That seems quite easily handled with a different RBM_ mode.
>
>
>
>> > LWLockAcquire(BufferDescriptorGetContentLock(buf), LW_EXCLUSIVE);
>> > /*
>> > * Reread page from OS, and recheck. This needs to happen while
>> > * the IO lock prevents rereading from the OS. Note that we do
>> > * not want to rely on the buffer contents here, as that could
>> > * be very old cache contents.
>> > */
>> > perform_checksum_check(relfilenode, ERROR);
>> >
>> > LWLockRelease(BufferDescriptorGetContentLock(buf), LW_EXCLUSIVE);
>> > ReleaseBuffer(buf);
>
>This should be the IO lock, not content lock, sorry. Copy & pasto.
>
>
>> I don't particularly like having to lock pages in this way while
>> performing this check, espectially with having to read the page into
>> shared buffers potentially.
>
>Given it's only the IO lock (see above correction), and only if we can't
>verify the checksum during the race, I fail to see how that can be a
>problem?
>
IMHO not a problem.
cheers
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2019-03-27 00:11:23 | Re: Pluggable Storage - Andres's take |
Previous Message | Andres Freund | 2019-03-27 00:06:45 | pgsql: Compute XID horizon for page level index vacuum on primary. |