Re: Trying to recover a corrupted database

From: John Scalia <jayknowsunix(at)gmail(dot)com>
To: Scott Whitney <swhitney(at)journyx(dot)com>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Trying to recover a corrupted database
Date: 2014-07-17 19:24:47
Message-ID: 53C822FF.4070301@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

We're only working with a single database on this system, and yes, I attempted a pg_dump earlier today on just one single table - it failed (I know pg_dump and pg_basebackup use
different mechanisms.) Mind you it's a large table with 154,000 rows in it. The overall database is somewhere around 43GB.

On 7/17/2014 3:16 PM, Scott Whitney wrote:
> a) How many databases do you have?
> b) Have you tried to pg_dump the database(s) to see if that succeeds?
>
> If you _can_ pg_dump (which you might or might not be able to do), you could re-init the cluster and restore.
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Hi all,
>
> You may have seen my post from yesterday about our production database getting corrupted. Well, this morning we brought the system down to single user and ran an fsck which did
> report some drive errors. We repeated until no additional errors were reported. Then, we brought the system back to multi-user status and ran a successful pg_basebackup on the
> broken database. Since then we restarted the database and a ps -ef result looks like:
>
> /usr/pgsql-9.2/bin/postmaster -D /opt/datacenter -o -c zero_damaged_pages=true -i -N 384 -p 5431
>
> After the Db started up, we ran a VACUUM FULL ANALYZE which ran for about 3 hours, but the database is still showing the same type of errors in its log: invalid page header in
> block 29718... etc. What disturbed me a little, is that I don't think the zero_damaged_pages got applied. Checking the pg_settings table, we got:
>
> select name, setting, boot_val, reset_val from pg_settings where name = 'zero_damaged_pages';
> name | setting | boot_val | reset_val
> ---------------------------------------------------------------------------------
> zero_damaged_pages | on | off | on
>
> Now, my colleague ran this after he tried running some operations again after I told him how to set zero_damaged_pages again. He swears that that it was on when the first VACUUM
> FULL ANALYZE was run, but I'm not as sure. Plus, I don't understand why the boot_val shows as off. In any event, as we're still getting log errors like before, I don't really
> know
> what to try next other than rerunning the VACUUM FULL again. Help?
> --
> Jay
>
>
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
>
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message John Scalia 2014-07-17 19:40:31 Re: Trying to recover a corrupted database
Previous Message Scott Whitney 2014-07-17 19:16:35 Re: Trying to recover a corrupted database