Re: corruption issue after server crash - ERROR: unexpected chunk number 0

From: Mike Broers <mbroers(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: corruption issue after server crash - ERROR: unexpected chunk number 0
Date: 2013-11-21 22:30:50
Message-ID: CAB9893iYU=yPJV-R=Mrc4mKPkfX9PvwDJRhhRNz+e5tO=o8umw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks for the response. fsync and full_page_writes are both on.

Our database runs on a managed hosting provider's vmhost server/san, I can
possibly request for them to provide some hardware test results - do you
have any specifics diagnostics in mind? The crash was apparently due to
our vmhost suddenly losing power, the only row that it has complained with
the chunk error also migrated into both standby servers, and as previously
stated was fixed with a reindex of the parent table in one of the standby
servers after taking it out of recovery. The vacuumdb -avz on this test
copy didnt have any errors or warnings, im going to also run a pg_dumpall
on this host to see if any other rows are problematic.

Is there something else I can run to confirm we are more or less ok at the
database level after the pg_dumpall or is there no way to be sure and a
fresh initdb is required.

I am planning on running the reindex in actual production tonight during
our maintenance window, but was hoping if that worked we would be out of
the woods.

On Thu, Nov 21, 2013 at 3:56 PM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:

> Mike Broers <mbroers(at)gmail(dot)com> wrote:
>
> > Hello we are running postgres 9.2.5 on RHEL6, our production
> > server crashed hard and when it came back up our logs were
> > flooded with:
>
> > ERROR: unexpected chunk number 0 (expected 1) for toast value 117927127
> in pg_toast_19122
>
> Your database is corrupted. Unless you were running with fsync =
> off or full_page_writes = off, that should not happen. It is
> likely to be caused by a hardware problem (bad RAM, a bad disk
> drive, or network problems if your storage is across a network).
>
> If it were me, I would stop the database service and copy the full
> data directory tree.
>
> http://wiki.postgresql.org/wiki/Corruption
>
> If fsync or full_page_writes were off, your best bet is probably to
> go to your backup. If you don't go to a backup, you should try to
> get to a point where you can run pg_dump, and dump and load to a
> freshly initdb'd cluster.
>
> If fsync and full_page_writes were both on, you should run hardware
> diagnostics at your earliest opportunity. When hardware starts to
> fail, the first episode is rarely the last or the most severe.
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Kevin Grittner 2013-11-21 22:51:53 Re: corruption issue after server crash - ERROR: unexpected chunk number 0
Previous Message Joey Quinn 2013-11-21 22:24:56 Re: Primary Key