Quick Links

Re: PANIC while doing failover (streaming replication)

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Mikko Partio <mpartio(at)gmail(dot)com>
Cc:	pgsql-admin <pgsql-admin(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PANIC while doing failover (streaming replication)
Date:	2011-07-01 05:54:48
Message-ID:	BANLkTinjKdL1mDD7hLzCbqtSA_wCp6QP8A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-admin pgsql-hackers

On Fri, Jul 1, 2011 at 2:18 PM, Mikko Partio <mpartio(at)gmail(dot)com> wrote:
> Hello list
> I have two a machine cluster with PostgreSQL 9.0.4 and streaming
> replication. In a normal situation I did a failover -- touched the trigger
> file in standby to promote it to production mode. I have done this
> previously without any complications but now the to-be-production-database
> had a PANIC and shut itself down. From the logs:
> postgres[10751]: [2-1] 2011-06-30 17:25:24 EEST [10751]: [1-1] user=,db=
> LOG: streaming replication successfully connected to primary
> postgres[10736]: [10-1] 2011-07-01 07:50:29 EEST [10736]: [10-1] user=,db=
> LOG: trigger file found: /postgresql/data/finish_replication.trigger
> postgres[10751]: [3-1] 2011-07-01 07:50:29 EEST [10751]: [2-1] user=,db=
> FATAL: terminating walreceiver process due to administrator command
> postgres[10736]: [11-1] 2011-07-01 07:50:30 EEST [10736]: [11-1] user=,db=
> LOG: redo done at AE/8B1855C8
> postgres[10736]: [12-1] 2011-07-01 07:50:30 EEST [10736]: [12-1] user=,db=
> LOG: last completed transaction was at log time 2011-07-01
> 07:50:14.67424+03
> postgres[10736]: [13-1] 2011-07-01 07:50:30 EEST [10736]: [13-1] user=,db=
> LOG: restored log file "00000006000000AE0000008B" from archive
> postgres[10736]: [14-1] 2011-07-01 07:50:30 EEST [10736]: [14-1] user=,db=
> LOG: selected new timeline ID: 7
> postgres[10736]: [15-1] 2011-07-01 07:50:30 EEST [10736]: [15-1] user=,db=
> LOG: restored log file "00000006.history" from archive
> postgres[10736]: [16-1] 2011-07-01 07:50:30 EEST [10736]: [16-1] user=,db=
> LOG: archive recovery complete
> postgres[10736]: [17-1] 2011-07-01 07:50:30 EEST [10736]: [17-1] user=,db=
> WARNING: page 68975 of relation base/3711580/4398155 was uninitialized
> postgres[10736]: [18-1] 2011-07-01 07:50:30 EEST [10736]: [18-1] user=,db=
> WARNING: page 782 of relation base/3711580/4395394 was uninitialized
> postgres[10736]: [19-1] 2011-07-01 07:50:30 EEST [10736]: [19-1] user=,db=
> WARNING: page 68948 of relation base/3711580/4398155 was uninitialized
> postgres[10736]: [20-1] 2011-07-01 07:50:30 EEST [10736]: [20-1] user=,db=
> WARNING: page 68986 of relation base/3711580/4398155 was uninitialized
> [ there are maybe 50 lines of these WARNING messages for all together six
> different relations, all indexes ]
> postgres[10736]: [73-1] 2011-07-01 07:50:31 EEST [10736]: [73-1] user=,db=
> PANIC: WAL contains references to invalid pages
> postgres[10727]: [2-1] 2011-07-01 07:50:31 EEST [10727]: [2-1] user=,db=
> LOG: startup process (PID 10736) was terminated by signal 6: Aborted
> postgres[10727]: [3-1] 2011-07-01 07:50:31 EEST [10727]: [3-1] user=,db=
> LOG: terminating any other active server processes
> Then I started postgres as it was shut down:
> postgres[12191]: [1-1] 2011-07-01 07:50:59 EEST [12191]: [1-1] user=,db=
> LOG: database system was interrupted while in recovery at log time
> 2011-07-01 07:44:02 EEST
> postgres[12191]: [1-2] 2011-07-01 07:50:59 EEST [12191]: [2-1] user=,db=
> HINT: If this has occurred more than once some data might be corrupted and
> you might need to choose an earlier recovery target.
> postgres[12191]: [2-1] 2011-07-01 07:50:59 EEST [12191]: [3-1] user=,db=
> LOG: database system was not properly shut down; automatic recovery in
> progress
> postgres[12191]: [3-1] 2011-07-01 07:50:59 EEST [12191]: [4-1] user=,db=
> LOG: redo starts at AE/8B13E398
> postgres[12191]: [4-1] 2011-07-01 07:50:59 EEST [12191]: [5-1] user=,db=
> LOG: consistent recovery state reached at AE/8C000000
> postgres[12191]: [5-1] 2011-07-01 07:50:59 EEST [12191]: [6-1] user=,db=
> LOG: unexpected pageaddr AE/6E000000 in log file 174, segment 140, offset 0
> postgres[12191]: [6-1] 2011-07-01 07:50:59 EEST [12191]: [7-1] user=,db=
> LOG: redo done at AE/8B1855C8
> postgres[12191]: [7-1] 2011-07-01 07:50:59 EEST [12191]: [8-1] user=,db=
> LOG: last completed transaction was at log time 2011-07-01
> 07:50:14.67424+03
> postgres[12194]: [1-1] 2011-07-01 07:50:59 EEST [12194]: [1-1] user=,db=
> LOG: autovacuum launcher started
> postgres[12189]: [1-1] 2011-07-01 07:50:59 EEST [12189]: [1-1] user=,db=
> LOG: database system is ready to accept connections
> Are these log messages something to worry about?

http://archives.postgresql.org/pgsql-hackers/2008-12/msg01335.php
According to the above thread, this issue seems to exist for a few years,
and be unsolved yet. Could you provide a self-contained test case?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

PANIC while doing failover (streaming replication) at 2011-07-01 05:18:39 from Mikko Partio

Responses

Re: PANIC while doing failover (streaming replication) at 2011-07-01 07:16:07 from Simon Riggs
Re: PANIC while doing failover (streaming replication) at 2011-07-01 11:00:35 from Mikko Partio
Re: [HACKERS] PANIC while doing failover (streaming replication) at 2012-06-20 14:53:00 from Matheus Ricardo Espanhol

Browse pgsql-admin by date

	From	Date	Subject
Next Message	Simon Riggs	2011-07-01 07:16:07	Re: PANIC while doing failover (streaming replication)
Previous Message	Mikko Partio	2011-07-01 05:18:39	PANIC while doing failover (streaming replication)

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jeff Davis	2011-07-01 06:01:10	Re: Range Types, constructors, and the type system
Previous Message	Magnus Hagander	2011-07-01 05:53:53	Re: add support for logging current role (what to review?)