From: | Greg Stark <stark(at)mit(dot)edu> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com> |
Subject: | Re: Recovery inconsistencies, standby much larger than primary |
Date: | 2014-01-31 20:28:31 |
Message-ID: | CAM-w4HObtoH7vekEP6W5C-CCie26CDNyAXK8G3vPcVTWxZdGtw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
One thing I keep coming back to is a bad ran chip setting a bit in the
block number. But I just can't seem to get it to add up. The difference is
not a power of two, it had happened on two different machines, and we don't
see other weirdness on the machine. It seems like a strange coincidence it
would happen to the same variable twice and not to other variables.
Unless there's some unrelated code writing through a wild pointer, possibly
to a stack allocated object that just happens to often be that variable?
--
greg
On 31 Jan 2014 20:21, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Greg Stark <stark(at)mit(dot)edu> writes:
> > So just to summarize, this xlog record:
> > [cur:EA1/637140, xid:1418089147, rmid:11(Btree), len/tot_len:18/6194,
> > info:8, prev:EA1/635290] insert_leaf: s/d/r:1663/16385/1261982 tid
> > 3634978/282
> > [cur:EA1/637140, xid:1418089147, rmid:11(Btree), len/tot_len:18/6194,
> > info:8, prev:EA1/635290] bkpblock[1]: s/d/r:1663/16385/1261982
> > blk:3634978 hole_off/len:1240/2072
>
> > Appears to have been written to [ block 7141472 ]
>
> I've been staring at the code for a bit trying to guess how that could
> have happened. Since the WAL record has a backup block, btree_xlog_insert
> would have passed control to RestoreBackupBlock, which would call
> XLogReadBufferExtended with mode RBM_ZERO, so there would be no complaint
> about writing past the end of the relation. Now, you can imagine some
> very low-level error causing a write to go to the wrong page due to a seek
> problem or some such, but it's hard to credit that that would've resulted
> in creation of all the intervening segment files. Some level of our code
> had to have thought it was being told to extend the relation.
>
> However, on closer inspection I was a bit surprised to realize that there
> are two possible candidates for doing that! XLogReadBufferExtended will
> extend the relation, a block at a time, if told to write a page past
> the current nominal EOF. And in md.c, _mdfd_getseg will *also* extend
> the relation if we're InRecovery, even though it normally would not do
> so when called from mdwrite().
>
> Given the behavior in XLogReadBufferExtended, I rather think that the
> InRecovery special case in _mdfd_getseg is dead code and should be
> removed. But for the purpose at hand, it's more interesting to try to
> confirm which of these code levels did the extension. I notice that
> _mdfd_getseg only bothers to write the last physical page of each segment,
> whereas XLogReadBufferExtended knows nothing of segments and will
> ploddingly write every page. So on a filesystem that supports "holes"
> in files, I'd expect that the added segments would be fully allocated
> if XLogReadBufferExtended did the deed, but they'd be quite small if
> _mdfd_getseg did so. The du results you started with suggest that the
> former is the case, but could you verify that the filesystem this is
> on supports holes and that du will report only the actually allocated
> space when there's a hole?
>
> Assuming that the extension was done in XLogReadBufferExtended, we are
> forced to the conclusion that XLogReadBufferExtended was passed a bad
> block number (viz 7141472); and it's pretty hard to see how that could
> happen. RestoreBackupBlock is just passing the value it got out of the
> WAL record. I thought about the idea that it was wrong about exactly
> where the BkpBlock struct was in the record, but that would presumably
> lead to garbage relnode and fork numbers not just a bad block number.
>
> So I'm still baffled ...
>
> regards, tom lane
>
From | Date | Subject | |
---|---|---|---|
Next Message | Anirudh | 2014-01-31 20:35:58 | Re: Regarding google summer of code |
Previous Message | Merlin Moncure | 2014-01-31 19:48:54 | Re: jsonb and nested hstore |