From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Bricklen Anderson <BAnderson(at)PresiNET(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Invalid headers and xlog flush failures |
Date: | 2005-02-02 18:17:33 |
Message-ID: | 620.1107368253@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Bricklen Anderson <BAnderson(at)PresiNET(dot)com> writes:
> Tom Lane wrote:
>> I would have suggested that maybe this represented on-disk data
>> corruption, but the appearance of two different but not-too-far-apart
>> WAL offsets in two different pages suggests that indeed the end of WAL
>> was up around segment 972 or 973 at one time.
> Nope, never touched pg_resetxlog.
> My pg_xlog list ranges from 000000010000007300000041 to 0000000100000073000000FE, with no breaks.
> There are also these: 000000010000007400000000 to 00000001000000740000000B
That seems like rather a lot of files; do you have checkpoint_segments
set to a large value, like 100? The pg_controldata dump shows that the
latest checkpoint record is in the 73/41 file, so presumably the active
end of WAL isn't exceedingly far past that. You've got 200 segments
prepared for future activity, which is a bit over the top IMHO.
But anyway, the evidence seems pretty clear that in fact end of WAL is
in the 73 range, and so those page LSNs with 972 and 973 have to be
bogus. I'm back to thinking about dropped bits in RAM or on disk.
IIRC these numbers are all hex, so the extra "9" could come from just
two bits getting turned on that should not be. Might be time to run
memtest86 and/or badblocks.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas F.O'Connell | 2005-02-02 18:21:00 | Re: PL/PgSQL, Inheritance, Locks, and Deadlocks |
Previous Message | Lonni J Friedman | 2005-02-02 18:17:09 | Re: capturing/viewing sort_mem utilization on a per query basis |