Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From: Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Толстенко Илья <tolstenko_iv(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>, Соболев Виталий Анатольевич <sobolev_va(at)gw(dot)tander(dot)ru>
Subject: Re: Completely broken replica after PANIC: WAL contains references to invalid pages
Date: 2013-11-01 07:35:18
Message-ID: CAL_0b1tyfAYxg0u7U6vhKn6e1PbBkib3hVh_o7Wqg0Cz4xTn1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> The attached patch fixes this although I don't like the way it knowledge of the
> point up to which StartupSUBTRANS zeroes pages is handled.

So, after half a year the same failure has happened again on the same
replica, but now patched with the Andres' patch (9.2.4 + the patch)
that was supposed to fix it.

Here is the link to the full conversation.

http://www.postgresql.org/message-id/flat/CAL_0b1t=WuM6roO8dki=w8DhH8P8whhohbPjReymmQUrOcNT2A(at)mail(dot)gmail(dot)com

Here is the logs.

2013-10-31 22:51:44 MSK 30711 @ from [vxid:1/0 txid:0] [] WARNING:
page 27415 of relation base/16436/3220672275 is uninitialized
2013-10-31 22:51:44 MSK 30711 @ from [vxid:1/0 txid:0] [] CONTEXT:
xlog redo visible: rel 1663/16436/3220672275; blk 27415
2013-10-31 22:51:44 MSK 30711 @ from [vxid:1/0 txid:0] [] PANIC: WAL
contains references to invalid pages
2013-10-31 22:51:44 MSK 30711 @ from [vxid:1/0 txid:0] [] CONTEXT:
xlog redo visible: rel 1663/16436/3220672275; blk 27415
2013-10-31 22:51:44 MSK 30708 @ from [vxid: txid:0] [] LOG: startup
process (PID 30711) was terminated by signal 6: Aborted
2013-10-31 22:51:44 MSK 30708 @ from [vxid: txid:0] [] LOG:
terminating any other active server processes

I saved the base/16436/3220672275* files and pg_xlog directory, just in case.

On attempt to restart it printed the same in logs and didn't started.

2013-11-01 08:15:25 MSK 767 @ from [vxid:1/0 txid:0] [] LOG:
consistent recovery state reached at 2F02/2774CA28
2013-11-01 08:15:25 MSK 764 @ from [vxid: txid:0] [] LOG: database
system is ready to accept read only connections
2013-11-01 08:15:25 MSK 767 @ from [vxid:1/0 txid:0] [] WARNING:
page 27415 of relation base/16436/3220672275 is uninitialized
2013-11-01 08:15:25 MSK 767 @ from [vxid:1/0 txid:0] [] CONTEXT:
xlog redo visible: rel 1663/16436/3220672275; blk 27415
2013-11-01 08:15:25 MSK 767 @ from [vxid:1/0 txid:0] [] PANIC: WAL
contains references to invalid pages
2013-11-01 08:15:25 MSK 767 @ from [vxid:1/0 txid:0] [] CONTEXT:
xlog redo visible: rel 1663/16436/3220672275; blk 27415
2013-11-01 08:15:25 MSK 764 @ from [vxid: txid:0] [] LOG: startup
process (PID 767) was terminated by signal 6: Aborted
2013-11-01 08:15:25 MSK 764 @ from [vxid: txid:0] [] LOG:
terminating any other active server processes

Here is the pg_controldata ouptut.

pg_control version number: 922
Catalog version number: 201204301
Database system identifier: 5858109675396804534
Database cluster state: in archive recovery
pg_control last modified: Птн 01 Ноя 2013 07:52:08
Latest checkpoint location: 2F00/C9BCE828
Prior checkpoint location: 2F00/C9BCE828
Latest checkpoint's REDO location: 2F00/32F59B70
Latest checkpoint's TimeLineID: 2
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 3/805663702
Latest checkpoint's NextOID: 3227099776
Latest checkpoint's NextMultiXactId: 4809163
Latest checkpoint's NextMultiOffset: 21342992
Latest checkpoint's oldestXID: 605734616
Latest checkpoint's oldestXID's DB: 16436
Latest checkpoint's oldestActiveXID: 805262681
Time of latest checkpoint: Чтв 31 Окт 2013 21:00:02
Minimum recovery ending location: 2F02/2774CA28
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
Current wal_level setting: hot_standby
Current max_connections setting: 550
Current max_prepared_xacts setting: 0
Current max_locks_per_xact setting: 64
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value

Any thoughts?

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray(dot)ru(at)gmail(dot)com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Kevin Grittner 2013-11-01 08:16:47 Re: [BUGS] BUG #8542: Materialized View with another column_name does not work?
Previous Message Ashutosh Bapat 2013-11-01 04:48:11 Re: [BUGS] BUG #8542: Materialized View with another column_name does not work?