From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>, matioli(dot)matheus(at)gmail(dot)com, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru> |
Subject: | Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages |
Date: | 2014-01-13 21:02:45 |
Message-ID: | 20140113210245.GD14861@awork2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
On 2014-01-13 22:40:32 +0200, Heikki Linnakangas wrote:
> On 01/13/2014 10:26 PM, Andres Freund wrote:
> >On 2014-01-13 15:19:06 -0500, Tom Lane wrote:
> >>I concur that the right fix requires a
> >>new operating mode for XLogReadBufferExtended, perhaps RBM_NORMAL_ZERO_OK.
> >>I think the spec for this should be that if the page doesn't exist or
> >>contains zeroes, we return InvalidBuffer without logging the page number
> >>as invalid. The doesn't-exist case is justified by the expectation that
> >>there will be a later RBM_NORMAL call for a larger page number, which will
> >>result in a suitable complaint if the page range isn't there.
>
> I think it's more natural to return the empty page to the caller, rather
> than InvalidBuffer.
Hm. I don't see the advantage - and it makes it impossible to recognize
that case at the caller level.
> >That's a sensible way to go, yes. But I wonder if we wouldn't end up
> >with less complicated code if we added a variant of ReadBuffer that only
> >returns a buffer from cache if already present in s_b.
> >I started to prototype something like RBM_NORMAL_ZERO_OK shortly after
> >Heikki's message and it seems to quickly turn a bit ugly because the
> >surrounding code isn't really ready to deal with not returning a
> >buffer. And for the purpose of that redo routine, that'd actually be
> >better.
>
> If it's trivial to add such a mode to buffer cache, then sure, let's do that
> by all means. But I seriously doubt it's really simple enough to be
> back-patchable.
Isn't it (untested, written in the editor) just something like:
/* only works for pages in shared buffers */
Assert(!SmgrIsTemp(smgr));
/* Make sure we will have room to remember the buffer pin */
ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
/* create a tag so we can lookup the buffer */
INIT_BUFFERTAG(newTag, smgr->smgr_rnode.node, forkNum, blockNum);
newHash = BufTableHashCode(&newTag);
newPartitionLock = BufMappingPartitionLock(newHash);
/* now, check if the block is in the buffer pool already */
LWLockAcquire(newPartitionLock, LW_SHARED);
buf_id = BufTableLookup(&newTag, newHash);
if (buf_id >= 0)
{
/*
* Found it. Now, pin the buffer so no one can steal it from the
* buffer pool, and check to see if the correct data has been loaded
* into the buffer.
*/
buf = &BufferDescriptors[buf_id];
valid = PinBuffer(buf, NULL);
/* Can release the mapping lock as soon as we've pinned it */
LWLockRelease(newPartitionLock);
if (!valid)
{
/*
* We can only get here if (a) someone else is still reading in
* the page, or (b) a previous read attempt failed. We have to
* wait for any active read attempt to finish, and then set up our
* own read attempt if the page is still not BM_VALID.
* StartBufferIO does it all.
*/
if (StartBufferIO(buf, true))
{
/*
* If we get here, previous attempts to read the buffer must
* have failed ...
*/
return InvalidBuffer;
}
}
return buf;
}
/*
* Didn't find it in the buffer pool, done.
*/
LWLockRelease(newPartitionLock);
return InvalidBuffer;
Now, that's not trivial, but also not too complicated.
> With RBM_NORMAL_ZERO_OK, AFAICS we're talking about a tiny patch to
> XLogReadBufferExtended. bufmgr.c doesn't need to do anything about the new
> mode, as it's XLogReadBuffer that does the the check for zero pages. Per
> attached patch (for demonstration purposes only, you also need to add the
> new mode to the header file and adjust comments).
I thought about that approach at first as well, but I am not so sure
it's sufficient. Isn't it quite possible that we'd end up reading a page
that was *partially* written during a crash and due to that has a
corrupted checksum? There won't be any protection due to WAL replay/full
page writes against that case here.
Now, you could argue that that shouldn't be the case because we're only
entering that codepath once STANDBY_SNAPSHOT_READY and you might be
right...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2014-01-13 21:25:48 | Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages |
Previous Message | Tom Lane | 2014-01-13 20:56:23 | Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2014-01-13 21:03:24 | Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance |
Previous Message | Andres Freund | 2014-01-13 20:59:34 | Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance |