| From: | Andres Freund <andres(at)2ndquadrant(dot)com> | 
|---|---|
| To: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> | 
| Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>, matioli(dot)matheus(at)gmail(dot)com, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru> | 
| Subject: | Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages | 
| Date: | 2014-01-13 21:02:45 | 
| Message-ID: | 20140113210245.GD14861@awork2.anarazel.de | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-bugs pgsql-hackers | 
On 2014-01-13 22:40:32 +0200, Heikki Linnakangas wrote:
> On 01/13/2014 10:26 PM, Andres Freund wrote:
> >On 2014-01-13 15:19:06 -0500, Tom Lane wrote:
> >>I concur that the right fix requires a
> >>new operating mode for XLogReadBufferExtended, perhaps RBM_NORMAL_ZERO_OK.
> >>I think the spec for this should be that if the page doesn't exist or
> >>contains zeroes, we return InvalidBuffer without logging the page number
> >>as invalid.  The doesn't-exist case is justified by the expectation that
> >>there will be a later RBM_NORMAL call for a larger page number, which will
> >>result in a suitable complaint if the page range isn't there.
> 
> I think it's more natural to return the empty page to the caller, rather
> than InvalidBuffer.
Hm. I don't see the advantage - and it makes it impossible to recognize
that case at the caller level.
> >That's a sensible way to go, yes. But I wonder if we wouldn't end up
> >with less complicated code if we added a variant of ReadBuffer that only
> >returns a buffer from cache if already present in s_b.
> >I started to prototype something like RBM_NORMAL_ZERO_OK shortly after
> >Heikki's message and it seems to quickly turn a bit ugly because the
> >surrounding code isn't really ready to deal with not returning a
> >buffer. And for the purpose of that redo routine, that'd actually be
> >better.
> 
> If it's trivial to add such a mode to buffer cache, then sure, let's do that
> by all means. But I seriously doubt it's really simple enough to be
> back-patchable.
Isn't it (untested, written in the editor) just something like:
    /* only works for pages in shared buffers */
	Assert(!SmgrIsTemp(smgr));
    /* Make sure we will have room to remember the buffer pin */
    ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
    /* create a tag so we can lookup the buffer */
    INIT_BUFFERTAG(newTag, smgr->smgr_rnode.node, forkNum, blockNum);
    newHash = BufTableHashCode(&newTag);
    newPartitionLock = BufMappingPartitionLock(newHash);
    /* now, check if the block is in the buffer pool already */
    LWLockAcquire(newPartitionLock, LW_SHARED);
    buf_id = BufTableLookup(&newTag, newHash);
    if (buf_id >= 0)
    {
        /*
         * Found it.  Now, pin the buffer so no one can steal it from the
         * buffer pool, and check to see if the correct data has been loaded
         * into the buffer.
         */
        buf = &BufferDescriptors[buf_id];
valid = PinBuffer(buf, NULL);
        /* Can release the mapping lock as soon as we've pinned it */
        LWLockRelease(newPartitionLock);
        if (!valid)
        {
            /*
             * We can only get here if (a) someone else is still reading in
             * the page, or (b) a previous read attempt failed.  We have to
             * wait for any active read attempt to finish, and then set up our
             * own read attempt if the page is still not BM_VALID.
             * StartBufferIO does it all.
             */
            if (StartBufferIO(buf, true))
            {
                /*
                 * If we get here, previous attempts to read the buffer must
                 * have failed ...
                 */
                return InvalidBuffer;
            }
        }
        return buf;
    }
    /*
     * Didn't find it in the buffer pool, done.
     */
    LWLockRelease(newPartitionLock);
return InvalidBuffer;
Now, that's not trivial, but also not too complicated.
> With RBM_NORMAL_ZERO_OK, AFAICS we're talking about a tiny patch to
> XLogReadBufferExtended. bufmgr.c doesn't need to do anything about the new
> mode, as it's XLogReadBuffer that does the the check for zero pages. Per
> attached patch (for demonstration purposes only, you also need to add the
> new mode to the header file and adjust comments).
I thought about that approach at first as well, but I am not so sure
it's sufficient. Isn't it quite possible that we'd end up reading a page
that was *partially* written during a crash and due to that has a
corrupted checksum? There won't be any protection due to WAL replay/full
page writes against that case here.
Now, you could argue that that shouldn't be the case because we're only
entering that codepath once STANDBY_SNAPSHOT_READY and you might be
right...
Greetings,
Andres Freund
-- 
 Andres Freund	                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2014-01-13 21:25:48 | Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages | 
| Previous Message | Tom Lane | 2014-01-13 20:56:23 | Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Robert Haas | 2014-01-13 21:03:24 | Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance | 
| Previous Message | Andres Freund | 2014-01-13 20:59:34 | Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance |