Re: page 1 of relation global/11787 was uninitialized

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: page 1 of relation global/11787 was uninitialized
Date: 2013-04-09 16:42:40
Message-ID: 20130409164240.GA11081@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-04-09 18:21:20 +0200, Stephen R. van den Berg wrote:
> Just today one of my systems experienced a kernel panic, and halted abruptly.
> Running Linux 3.1.9, PostgreSQL 9.0.4 (Debian 9.0.4-1+b1, to be precise).

Thats an absolutely outdated version of 9.0. You shouldn't be running
this in production.

On 2013-04-09 09:27:52 -0700, Joshua D. Drake wrote:
>
> On 04/09/2013 09:21 AM, Stephen R. van den Berg wrote:
>
> >-------------------------
> >
> >Looking at global/11787, doesn't reveal any obvious corruption.

> >The server was running with:
> > synchronous_commit = off
> > full_page_writes = off
>
> full_page_writes = off is the problem.

Yea, and it can cause very hard to recover corruption, its not that you
only may loose some of the last transactions, in contrast to
synchronous_commit=off where you can loose the last transactions but
which never should cause corruption.

> From the docs:
>
> Turning this parameter off speeds normal operation, but might lead to either
> unrecoverable data corruption, or silent data corruption, after a system
> failure. The risks are similar to turning off fsync, though smaller, and it
> should be turned off only based on the same circumstances recommended for
> that parameter.
>
> http://www.postgresql.org/docs/9.0/static/runtime-config-wal.html#GUC-FULL-PAGE-WRITES

That was my first thought as well, but whilst it sure can cause
corruption, I can't immediately see how it should be responsible for
this error. That seems to indicate another problem.

Stephen, could you check how big global/11787 exactly is? Too bad we
don't know what that relfilenode corresponds to and we can't easily find
out what it maps to.

Afaik we don't have any debugging utility to dump the pg_filenode.map
contents?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-04-09 16:50:16 Re: page 1 of relation global/11787 was uninitialized
Previous Message Kevin Grittner 2013-04-09 16:39:52 Re: Call for Google Summer of Code mentors, admins