Re: Recovery inconsistencies, standby much larger than primary

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Recovery inconsistencies, standby much larger than primary
Date: 2014-02-06 23:42:05
Message-ID: 2959.1391730125@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg Stark <stark(at)mit(dot)edu> writes:
> On Thu, Feb 6, 2014 at 11:48 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> That's not necessarily true. If e.g. the buffer mapping would change
>> racily, the result write from the bgwriter could very well end up
>> increasing the file size, leaving a hole inbetween its write and the
>> original size.

> a) the segment isn't sparse and b) there were whole segments full of
> nuls between the end of the tables and the final blocks.

> So the file was definitely extended by Postgres, not the OS and the
> bgwriter passes EXTENSION_FAIL which means it wouldn't create those
> intervening segments.

But ... when InRecovery, md.c will create such segments too. We had
dismissed that on the grounds that the files would be sparse because
of the way md.c creates them. However, it is real damn hard to see
how the loop in XLogReadBufferExtended could've accessed a bogus block,
other than hardware misfeasance which I don't believe any more than
you do. The blkno that's passed to that function came directly out
of a WAL record that's in the private memory of the startup process
and recently passed a CRC check. You'd have to believe some sort
of asynchronous memory clobber inside the startup process.

On the other hand, if _mdfd_getseg did the deed, there's a whole lot
more space for something funny to have happened, because now we're
talking about a buffer being written in preparation for eviction
from shared buffers, long after WAL replay filled it.

So I'm wondering if there's something wrong with our deduction from
file non-sparseness. In this connection, google quickly found me
a report of XFS "losing" the sparse state of a file across multiple
writes:
http://oss.sgi.com/archives/xfs/2011-06/msg00225.html
I wonder whether that bug or a similar one exists in your production
kernel.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2014-02-06 23:47:31 Re: jsonb and nested hstore
Previous Message Greg Stark 2014-02-06 23:06:36 Re: Recovery inconsistencies, standby much larger than primary