Re: Trying to recover a corrupted database

From: John Scalia <jayknowsunix(at)gmail(dot)com>
To: Scott Whitney <swhitney(at)journyx(dot)com>, "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Trying to recover a corrupted database
Date: 2014-07-17 19:40:31
Message-ID: 53C826AF.9050001@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Well, we already have the schema in a canned file. So, we wouldn't need to do a pg_dump for the schema, and I can get into the database using psql obviously. That's how I produced
the setting from pg_settings. But, I really don't want to have to go through 150K+ records to find the multiple broken ones, if I don't have to do so. I'm just hoping for a faster way.

On 7/17/2014 3:30 PM, Scott Whitney wrote:
> Well, at this point, let me let the big brains answer your main question on the thread and tell you what _I_ did when I got into a similar situation.
>
> a) pg_dump --schema-only <database name> > db.structure
> b) From here it was a lot of select * from table until I found the broken one(s) and I worked around the actual bad data by offset/limit on the selects on the busted tables.
>
> It took me several hours, but it did work (for me).
>
> That's assuming you can even psql to the db in the first place when the postmaster is running.
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> We're only working with a single database on this system, and yes, I attempted a pg_dump earlier today on just one single table - it failed (I know pg_dump and pg_basebackup
> use different mechanisms.) Mind you it's a large table with 154,000 rows in it. The overall database is somewhere around 43GB.
>
> On 7/17/2014 3:16 PM, Scott Whitney wrote:
>
> a) How many databases do you have?
> b) Have you tried to pg_dump the database(s) to see if that succeeds?
>
> If you _can_ pg_dump (which you might or might not be able to do), you could re-init the cluster and restore.
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Hi all,
>
> You may have seen my post from yesterday about our production database getting corrupted. Well, this morning we brought the system down to single user and ran an fsck
> which did
> report some drive errors. We repeated until no additional errors were reported. Then, we brought the system back to multi-user status and ran a successful
> pg_basebackup on the
> broken database. Since then we restarted the database and a ps -ef result looks like:
>
> /usr/pgsql-9.2/bin/postmaster -D /opt/datacenter -o -c zero_damaged_pages=true -i -N 384 -p 5431
>
> After the Db started up, we ran a VACUUM FULL ANALYZE which ran for about 3 hours, but the database is still showing the same type of errors in its log: invalid page
> header in
> block 29718... etc. What disturbed me a little, is that I don't think the zero_damaged_pages got applied. Checking the pg_settings table, we got:
>
> select name, setting, boot_val, reset_val from pg_settings where name = 'zero_damaged_pages';
> name | setting | boot_val | reset_val
> ---------------------------------------------------------------------------------
> zero_damaged_pages | on | off | on
>
> Now, my colleague ran this after he tried running some operations again after I told him how to set zero_damaged_pages again. He swears that that it was on when the
> first VACUUM
> FULL ANALYZE was run, but I'm not as sure. Plus, I don't understand why the boot_val shows as off. In any event, as we're still getting log errors like before, I
> don't really know
> what to try next other than rerunning the VACUUM FULL again. Help?
> --
> Jay
>
>
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
>
>
>
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message John Scalia 2014-07-17 20:05:52 Re: Trying to recover a corrupted database
Previous Message John Scalia 2014-07-17 19:24:47 Re: Trying to recover a corrupted database