From: | Hiroshi Inoue <Inoue(at)tpf(dot)co(dot)jp> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Vadim Mikheev <vmikheev(at)sectorbase(dot)com>, pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: Theory about XLogFlush startup failures |
Date: | 2002-01-15 02:23:44 |
Message-ID: | 3C4392B0.637CF161@tpf.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tom Lane wrote:
>
> I just spent some time trying to understand the mechanism behind the
> "XLogFlush: request is not satisfied" startup errors we've seen reported
> occasionally with 7.1. The only apparent way for this to happen is for
> XLogFlush to be given a garbage WAL record pointer (ie, one pointing
> beyond the current end of WAL), which presumably must be coming from
> a corrupted LSN field in a data page. Well, that's not too hard to
> believe during normal operation: say the disk drive drops some bits in
> the LSN field, and we read the page in, and don't have any immediate
> need to change it (which would cause the LSN to be overwritten); but we
> do find some transaction status hint bits to set, so the page gets
> marked dirty. Then when the page is written out, bufmgr will try to
> flush xlog using the corrupted LSN pointer.
I agree with you at least at the point that we had better
continue FlushBufferPool() even though STOP-error occurs.
BTW doesn't the LSN corruption imply the possibility
of the corruption of other parts (of e.g. pg_log) ?
regards,
Hiroshi Inoue
From | Date | Subject | |
---|---|---|---|
Next Message | Brent Verner | 2002-01-15 02:30:38 | Re: Problem reloading regression database |
Previous Message | Tatsuo Ishii | 2002-01-15 00:59:16 | Re: unicode words |