Quick Links

Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL
Date:	2010-03-19 11:37:08
Message-ID:	4BA361E4.7020309@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-committers pgsql-docs pgsql-hackers

Simon Riggs wrote:
> On Thu, 2010-03-18 at 23:27 +0900, Fujii Masao wrote:
>
>> I agree that this is a bigger problem. Since the standby always starts
>> walreceiver before replaying any WAL files in pg_xlog, walreceiver tries
>> to receive the WAL files following the REDO starting point even if they
>> have already been in pg_xlog. IOW, the same WAL files might be shipped
>> from the primary to the standby many times. This behavior is unsmart,
>> and should be addressed.
>
> We might also have written half a file many times. The files in pg_xlog
> are suspect whereas the files in the archive are not. If we have both we
> should prefer the archive.

Yep.

Here's a patch I've been playing with. The idea is that in standby mode,
the server keeps trying to make progress in the recovery by:

a) restoring files from archive
b) replaying files from pg_xlog
c) streaming from master

When recovery reaches an invalid WAL record, typically caused by a
half-written WAL file, it closes the file and moves to the next source.
If an error is found in a file restored from archive or in a portion
just streamed from master, however, a PANIC is thrown, because it's not
expected to have errors in the archive or in the master.

When a file is streamed from master, it's left in pg_xlog, so it's found
there after a standby restart, and recovery can progress to the same
point as before restart. It also means that you can copy partial WAL
files to pg_xlog at any time and have them replayed in a few seconds.

The code structure is a bit spaghetti-like, I'm afraid. Any suggestions
on how to improve that are welcome..

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment	Content-Type	Size
retry-wal-from-pg_xlog-1.patch	text/x-diff	19.0 KB

In response to

Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL at 2010-03-19 08:52:04 from Simon Riggs

Responses

Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL at 2010-03-19 12:43:34 from Tom Lane
Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL at 2010-03-19 13:28:48 from Alvaro Herrera
Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL at 2010-03-23 07:17:53 from Fujii Masao

Browse pgsql-committers by date

	From	Date	Subject
Next Message	Tom Lane	2010-03-19 12:41:23	Re: [COMMITTERS] pgsql: Reset btpo.xact following recovery of btree delete page.
Previous Message	Simon Riggs	2010-03-19 11:07:07	Re: [COMMITTERS] pgsql: Introduce WAL records to log reuse of btree pages, allowing

Browse pgsql-docs by date

	From	Date	Subject
Next Message	Tom Lane	2010-03-19 12:43:34	Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL
Previous Message	Simon Riggs	2010-03-19 08:52:04	Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Yeb Havinga	2010-03-19 11:37:22	Re: explain and PARAM_EXEC
Previous Message	Simon Riggs	2010-03-19 11:16:23	Re: Getting to beta1