Quick Links

Re: warning message in standby

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: warning message in standby
Date:	2010-06-11 12:32:27
Message-ID:	4C122CDB.70601@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 11/06/10 07:18, Fujii Masao wrote:
> On Fri, Jun 11, 2010 at 1:01 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> We're talking about a corrupt record (incorrect CRC, incorrect backlink
>> etc.), not errors within redo functions. During crash recovery, a corrupt
>> record means you've reached end of WAL. In standby mode, when streaming WAL
>> from master, that shouldn't happen, and it's not clear what to do if it
>> does. PANIC is not a good idea, at least if the server uses hot standby,
>> because that only makes the situation worse from availability point of view.
>> So we log the error as a WARNING, and keep retrying. It's unlikely that the
>> problem will just go away, but we keep retrying anyway in the hope that it
>> does. However, it seems that we're too aggressive with the retries.
>
> Right. The attached patch calms down the retries: if we found an invalid
> record while streaming WAL from master, we sleep for 5 seconds (needs to
> be reduced?) before retrying to replay the record which is in the same
> location where the invalid one was found. Comments?

Hmm, right now it doesn't even reconnect when it sees a corrupt record
streamed from the master. It's really pointless to retry in that case,
reapplying the exact same piece of WAL surely won't work. I think it
should disconnect, and then retry reading from archive and pg_xlog, and
then retry streaming again. That's pretty hopeless too, but it's at
least theoretically possible that something went wrong in the
transmission and the file in the archive is fine.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment	Content-Type	Size
calm_down_retries_v2.patch	text/x-diff	893 bytes

In response to

Re: warning message in standby at 2010-06-11 04:18:54 from Fujii Masao

Responses

Re: warning message in standby at 2010-06-11 13:34:48 from Fujii Masao

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Fujii Masao	2010-06-11 12:34:45	Re: SR slaves and .pgpass
Previous Message	Devrim GÜNDÜZ	2010-06-11 12:25:24	Re: PG 9.1 tentative timeline