Re: Streaming replication bug in 9.3.2, "WAL contains references to invalid pages"

From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Christophe Pettus" <xof(at)thebuild(dot)com>, "PostgreSQL-development Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication bug in 9.3.2, "WAL contains references to invalid pages"
Date: 2014-01-02 23:54:18
Message-ID: 45A7BFE0BCC0473393620BDB28F1DD7B@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: "Christophe Pettus" <xof(at)thebuild(dot)com>
We've had two clients experience a crash on the secondary of a streaming
replication pair, running PostgreSQL 9.3.2. In both cases, the messages
were close to this example:

2013-12-30 18:08:00.464 PST,,,23869,,52ab4839.5d3d,16,,2013-12-13 09:47:37
PST,1/0,0,WARNING,01000,"page 45785 of relation base/236971/365951 is
uninitialized",,,,,"xlog redo vacuum: rel 1663/236971/365951; blk 45794,
lastBlockVacuumed 45784",,,,""
2013-12-30 18:08:00.465 PST,,,23869,,52ab4839.5d3d,17,,2013-12-13 09:47:37
PST,1/0,0,PANIC,XX000,"WAL contains references to invalid pages",,,,,"xlog
redo vacuum: rel 1663/236971/365951; blk 45794, lastBlockVacuumed
45784",,,,""
2013-12-30 18:08:00.950 PST,,,23866,,52ab4838.5d3a,8,,2013-12-13 09:47:36
PST,,0,LOG,00000,"startup process (PID 23869) was terminated by signal 6:
Aborted",,,,,,,,,""

In both cases, the indicated relation was a primary key index. In one case,
rebuilding the primary key index caused the problem to go away permanently
(to date). In the second case, the problem returned even after a full dump
/ restore of the master database (that is, after a dump / restore of the
master, and reimaging the secondary, the problem returned at the same
primary key index, although of course with a different OID value).

It looks like this has been experienced on 9.2.6, as well:

I've experienced this problem with 9.2.4 once at the end of last year, too.
The messages were the same except the relation and page numbers. In
addition, I encountered a similar (possibly the same) problem with 9.1.6
about a year ago. At that time, I found in the pgsql-* MLs several people
report similar problems in the past several years, but those were not
solved. There seems to be a big dangerous bug hiding somewhere.

Regards
MauMau

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2014-01-02 23:54:26 Re: fix_PGSTAT_NUM_TABENTRIES_macro patch
Previous Message Mark Dilger 2014-01-02 23:50:19 Re: fix_PGSTAT_NUM_TABENTRIES_macro patch