Re: BUG #13822: Slave terminated - WAL contains references to invalid page

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Marek(dot)Petr(at)tieto(dot)com
Cc: PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13822: Slave terminated - WAL contains references to invalid page
Date: 2015-12-26 13:15:17
Message-ID: CAB7nPqQyhuJjQerCBxiS1bOg46OvE-EV9Om2bTyKrfaUhFHHVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Dec 22, 2015 at 9:05 PM, <Marek(dot)Petr(at)tieto(dot)com> wrote:
> 2015-12-22 00:25:11 CET @ WARNING: page 71566 of relation base/16422/23253 is uninitialized
> 2015-12-22 00:25:11 CET @ CONTEXT: xlog redo visible: rel 1663/16422/23253; blk 71566
> 2015-12-22 00:25:11 CET @ PANIC: WAL contains references to invalid pages
> 2015-12-22 00:25:11 CET @ CONTEXT: xlog redo visible: rel 1663/16422/23253; blk 71566
> 2015-12-22 00:25:12 CET @ LOG: startup process (PID 24434) was terminated by signal 6: Aborted
> 2015-12-22 00:25:12 CET @ LOG: terminating any other active server processes

Looking more closely at that, this is the code path of the redo
routine for XLOG_HEAP2_VISIBLE. I have been looking at the area of the
code around visibilitymap_set to try to see if there could be a race
condition with another backend extending the relation and causing the
page to be uninitialized but have not found anything yet. 9.4 has been
out for some time, and this is the first report of this kind for this
redo routine. Still, you have been able to reproduce the problem
twice, so this has the smell of a bug... Others, opinions?

Did you rebuild a new slave and let the master running, and perhaps
some data corruption is coming from it? What's the state of the same
pages on the master? Are they zero'ed?

Also, are you using any parameter with a value different than the
default. I don't know fsync, full_page_writes...

> select relname from pg_class where relfilenode in ('17230','23253');
> relname
> ----------------
> pg_toast_17225
> pg_toast_23246
> (2 rows)
>
> First toast's relation has 34GB, second 2452 MB.
> Is it possible to get more info from some deeper logging for the case it will occur again?

I am not sure to understand what you are looking for here. You could
make the logs more verbose but this would bloat your log partition...
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2015-12-26 13:45:57 Re: BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while
Previous Message 付施威 2015-12-25 04:54:20 How can I install postgresql 9.X on Compute with IBM's Power CPU ?