From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Greg Stark <stark(at)mit(dot)edu> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <jim(at)nasby(dot)net>, Jeff Davis <pgsql(at)j-davis(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: corrupt pages detected by enabling checksums |
Date: | 2013-05-10 06:44:19 |
Message-ID: | CA+U5nMKOw0WB7r9XQecFToRmnERFQ+FbnaXYRPOo=gfPeyX31Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 9 May 2013 23:13, Greg Stark <stark(at)mit(dot)edu> wrote:
> On Thu, May 9, 2013 at 10:45 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> On 9 May 2013 22:39, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>>>> If the current WAL record is corrupt and the next WAL record is in
>>>> every way valid, we can potentially continue.
>>>
>>> That seems like a seriously bad idea.
>>
>> I agree. But if you knew that were true, is stopping a better idea?
>
> Having one corrupt record followed by a valid record is not an
> abnormal situation. It could easily be the correct end of WAL.
I disagree, that *is* an abnormal situation and would not be the
"correct end-of-WAL".
Each WAL record contains a "prev" pointer to the last WAL record. So
for the next record to be valid the prev pointer would need to be
exactly correct.
> However it is possible to reduce the window. Every time the
> transaction log is synced a different file can be updated with the a
> known minimum transaction log recovery point. Even if it's not synced
> consistently on every transaction commit or wal sync it would serve as
> a low water mark. Recovering to that point is not sufficient but is
> necessary for a consistent recovery. That file could be synced lazily,
> say, every 10s or something like that and would guarantee that any wal
> corruption would be caught except for the last 10s of wal traffic for
> example.
I think it would be easy enough to have the WALwriter update the
minRecoveryPoint once per cycle, after it has flushed WAL.
Given the importance of pg_control and its small size, it seems like
it would be a good idea to take a backup copy of it every checkpoint
to make sure we have that data safe. And have pg_resetxlog keep a copy
also every time that is run.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
From | Date | Subject | |
---|---|---|---|
Next Message | Dave Page | 2013-05-10 06:46:39 | Re: improving PL/Python builds on OS X |
Previous Message | Bruce Momjian | 2013-05-10 02:14:34 | Re: Re: [GENERAL] pg_upgrade fails, "mismatch of relation OID" - 9.1.9 to 9.2.4 |