From: | Greg Stark <stark(at)mit(dot)edu> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Recovery inconsistencies, standby much larger than primary |
Date: | 2014-02-06 21:22:01 |
Message-ID: | CAM-w4HPvJCBRVV3dXg8aj0WzkU08dHuX-XYbfDYQhNrn5bnTQg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Feb 3, 2014 at 12:02 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> What version were you running before 9.1.11 exactly? I took a look
> through all the diffs from 9.1.9 up to 9.1.11, and couldn't find any
> changes that seemed even vaguely related to this. There are some
> changes in known-transaction tracking, but it's hard to see a connection
> there. Most of the other diffs are in code that wouldn't execute during
> WAL replay at all.
Both the primary and the standby were 9.1.11 from the get-go. The
database the primary was forked off of was 9.1.10 but as far as I can
tell the primary in the current pair has no problems.
What's worse is we created a new standby from the same base backup and
replayed the same records and it didn't reproduce the problem. This
means either it's a hardware problem -- but we've seen it on multiple
standbys on this database and at least one other database which is in
a different data centre -- or it's a race condition --but that's hard
to credit in the recovery code which is basically single-threaded.
And these records are from before the standby reaches a consistency so
it's hard to see how a connection from a hot standby client could
cause any kind of race condition. The only other thread that could
conceivably cause a heisenbug is the bgwriter. It's hard to imagine
how a race condition in there could be so easy to hit that it would
happen four times on one restore but otherwise go mostly unnoticed.
--
greg
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2014-02-06 21:35:07 | Re: mvcc catalo gsnapshots and TopTransactionContext |
Previous Message | Stefan Kaltenbrunner | 2014-02-06 19:49:35 | Re: [DOCS] Re: Viability of text HISTORY/INSTALL/regression README files (was Re: [COMMITTERS] pgsql: Document a few more regression test hazards.) |