Quick Links

Re: pgsql: Fast promote mode skips checkpoint at end of recovery.

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	pgsql-committers(at)postgresql(dot)org
Subject:	Re: pgsql: Fast promote mode skips checkpoint at end of recovery.
Date:	2013-01-29 16:49:51
Message-ID:	CA+U5nML25TB8-kH6kAPbHjNRap-c702zNPz8Nycdvvv3pHuESw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-committers pgsql-hackers

On 29 January 2013 16:27, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Tue, Jan 29, 2013 at 9:07 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> Fast promote mode skips checkpoint at end of recovery.
>> pg_ctl promote -m fast will skip the checkpoint at end of recovery so that we
>> can achieve very fast failover when the apply delay is low. Write new WAL record
>> XLOG_END_OF_RECOVERY to allow us to switch timeline correctly for downstream log
>> readers. If we skip synchronous end of recovery checkpoint we request a normal
>> spread checkpoint so that the window of re-recovery is low.
>
> When I tested this feature, I encountered the following FATAL message.
>
> FATAL: highest timeline 1 of the primary is behind recovery timeline 2
>
> Is this an intentional behavior or bug?

Tough one that.

> What I did in my test is:
>
> 1. Set up one master (A), one standby (B), one cascade standby (C)
> 2. After running pgbench -i -s 10, I promoted the standby (B) with fast mode
> 3. Then, I shut down the server (B) with immediate mode after it has been
> brought up to the master before end-of-recovery checkpoint has not been
> completed.
> 4. Restart the server (B).
> 5. After the standby (C) established the replication connection with (B),
> I got the above FATAL messages repeatedly.

Where do you get the errors, which server? The above doesn't contain a
promote command, so how does this make it fail.

Please show me the test case in more detail.

> Promoting (B) increments the timeline ID to 2 and generates the timeline
> history file. But after restarting (B), its timeline ID is reset to 1
> unexpectedly.
> This seems to be the cause of the problem.
>
> To address this problem, we should switch to new timeline ID whenever
> we read the XLOG_END_OF_RECOVERY even if it's a crash recovery?

We do. Do you see a problem with that code? There is no conditional recovery.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Re: pgsql: Fast promote mode skips checkpoint at end of recovery. at 2013-01-29 16:27:08 from Fujii Masao

Responses

Re: pgsql: Fast promote mode skips checkpoint at end of recovery. at 2013-01-29 16:56:46 from Devrim Gündüz

Browse pgsql-committers by date

	From	Date	Subject
Next Message	Simon Riggs	2013-01-29 16:51:34	Re: pgsql: Fast promote mode skips checkpoint at end of recovery.
Previous Message	Fujii Masao	2013-01-29 16:38:28	Re: pgsql: Fast promote mode skips checkpoint at end of recovery.

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Simon Riggs	2013-01-29 16:51:34	Re: pgsql: Fast promote mode skips checkpoint at end of recovery.
Previous Message	Fujii Masao	2013-01-29 16:43:29	Re: [PATCH] pg_isready (was: [WIP] pg_ping utility)