Re: Theory about XLogFlush startup failures

From: Hiroshi Inoue <Inoue(at)tpf(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Vadim Mikheev <vmikheev(at)sectorbase(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Theory about XLogFlush startup failures
Date: 2002-01-15 02:23:44
Message-ID: 3C4392B0.637CF161@tpf.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
>
> I just spent some time trying to understand the mechanism behind the
> "XLogFlush: request is not satisfied" startup errors we've seen reported
> occasionally with 7.1. The only apparent way for this to happen is for
> XLogFlush to be given a garbage WAL record pointer (ie, one pointing
> beyond the current end of WAL), which presumably must be coming from
> a corrupted LSN field in a data page. Well, that's not too hard to
> believe during normal operation: say the disk drive drops some bits in
> the LSN field, and we read the page in, and don't have any immediate
> need to change it (which would cause the LSN to be overwritten); but we
> do find some transaction status hint bits to set, so the page gets
> marked dirty. Then when the page is written out, bufmgr will try to
> flush xlog using the corrupted LSN pointer.

I agree with you at least at the point that we had better
continue FlushBufferPool() even though STOP-error occurs.

BTW doesn't the LSN corruption imply the possibility
of the corruption of other parts (of e.g. pg_log) ?

regards,
Hiroshi Inoue

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Brent Verner 2002-01-15 02:30:38 Re: Problem reloading regression database
Previous Message Tatsuo Ishii 2002-01-15 00:59:16 Re: unicode words