From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Sergey Konoplev <gray(dot)ru(at)gmail(dot)com> |
Cc: | pgsql-bugs(at)postgresql(dot)org, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Толстенко Илья <tolstenko_iv(at)gw(dot)tander(dot)ru> |
Subject: | Re: Completely broken replica after PANIC: WAL contains references to invalid pages |
Date: | 2013-03-30 17:21:44 |
Message-ID: | 20130330172144.GI28736@alap2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On 2013-03-29 14:53:26 -0700, Sergey Konoplev wrote:
> On Fri, Mar 29, 2013 at 2:38 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > I have to admit, I find it a bit confusing that so many people report a
> > bug and then immediately destroy all evidence of the bug. Just seems to
> > a happen a bit too frequently.
>
> You see, businesses usually need it up ASAP again. Sorry, I must have
> note down the output of pg_controldata straight after it got broken, I
> just have not came up to it.
But the business will also need the standby working correctly in case
of a critical incident of a primary. So it should have quite an interest
in fixing bugs in that area. Yes, I realize thats not always easy to do
:(.
> > Thats not a pg_controldata output from the broken replica though, or is
> > it? I guess its from a new standby?
>
> That was the output from the standby that was rsync-ed on top of the
> broken one. I thought you might find something useful in it.
> Can I test your guess some other way? And what was the guess?
Don't think you can easily test it. And after reading more code I am
pretty sure my original guess was bogus. As was my second. And third ;)
But I think I see what could be going on:
During HS we maintain pg_subtrans so we can deal with more than
PGPROC_MAX_CACHED_SUBXIDS in one TX. For that we need to regularly
extend subtrans so the pages are initialized when we setup the
topxid<->subxid mapping in ProcArrayApplyXidAssignment(). The call to
ExtendSUBTRANS happens in RecordKnownAssignedTransactionIds() which is
called from several places, including ProcArrayApplyXidAssignment().
The logic it uses is:
if (TransactionIdFollows(xid, latestObservedXid))
{
TransactionId next_expected_xid;
/*
* Extend clog and subtrans like we do in GetNewTransactionId() during
* normal operation using individual extend steps. Typical case
* requires almost no activity.
*/
next_expected_xid = latestObservedXid;
TransactionIdAdvance(next_expected_xid);
while (TransactionIdPrecedesOrEquals(next_expected_xid, xid))
{
ExtendCLOG(next_expected_xid);
ExtendSUBTRANS(next_expected_xid);
TransactionIdAdvance(next_expected_xid);
}
So if the xid is later than latestObservedXid we extend subtrans one by
one. So far so good. But we initialize it in
ProcArrayApplyRecoveryInfo() when consistency is initially reached:
latestObservedXid = running->nextXid;
TransactionIdRetreat(latestObservedXid);
Before that subtrans has initially been started up with:
if (wasShutdown)
oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
else
oldestActiveXID = checkPoint.oldestActiveXid;
...
StartupSUBTRANS(oldestActiveXID);
That means its only initialized up to checkPoint.oldestActiveXid. As it
can take some time till we reach consistency it seems rather plausible
that there now will be a gap in initilized pages. From
checkPoint.oldestActiveXid to running->nextXid if there are pages
inbetween.
Does that explanation sound about right to anybody else? I'll provide a
patch for the issue in a while, for now I'll try to reproduce it.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | ajmcello | 2013-03-31 03:41:14 | Re: BUG #8013: Memory leak |
Previous Message | stiening | 2013-03-30 14:01:27 | BUG #8013: Memory leak |