Re: hardware failure - data recovery

From: Rick Gigger <rick(at)alpinenetworking(dot)com>
To: Ron Johnson <ron(dot)l(dot)johnson(at)cox(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: hardware failure - data recovery
Date: 2006-10-19 05:46:36
Message-ID: 4537113C.3020808@alpinenetworking.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Ron Johnson wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 10/18/06 23:52, Rick Gigger wrote:
>> Rick Gigger wrote:
>>> Ron Johnson wrote:
>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> Hash: SHA1
>>>>
>>>> On 10/18/06 19:57, Rick Gigger wrote:
>>>>> To make a long story short lets just say that I had a bit of a hardware
>>>>> failure recently.
>>>>>
>>>>> If I got an error like this when trying to dump a db from the mangled
>>>>> data directory is it safe to say it's totally hosed or is there some
>>>>> chance of recovery?
>>>>>
>>>>> pg_dump: ERROR: could not open relation 1663/18392/18400: No such file
>>>>> or directory
>>>>> pg_dump: SQL command to dump the contents of table "file" failed:
>>>>> PQendcopy() failed.
>>>>> pg_dump: Error message from server: ERROR: could not open relation
>>>>> 1663/18392/18400: No such file or directory
>>>>> pg_dump: The command was: COPY public.file (vfs_id, vfs_type, vfs_path,
>>>>> vfs_name, vfs_modified, vfs_owner, vfs_data) TO stdout;
>>>> What happens when you fsck the relevant partitions?
>>> Errors about a bunch of duplicate inodes, missing inodes, etc. Should
>>> I do it again and get some of the exact text for you?
>> Also this is an example of the type of errors that were being logged
>> before it died:
>>
>> LOG: checkpoint record is at 26/41570488
>> LOG: redo record is at 26/41570488; undo record is at 0/0; shutdown TRUE
>
> What does Google say about these error messages and your fs?

Not much that is useful. I think this is a little beyond that scope. A
hardware failure basically left the fs and the db in an inconsistent
state. There is one table in one database that has a bunch of data in
it that I need to get out. I'm guessing I'm going to need to find
someone who understands the the internal structure of the files to go in
and pull out whatever data is still in tact.

I have been poking around and as far as I can tell, although one of the
toast indexes is gone the actual table files appear to be in tact. That
is they are still in the file system. I don't know if they are ok
internally.

I also get this error when trying to access the non-toasted data:

ERROR: could not access status of transaction 307904873
DETAIL: could not open file "pg_clog/0125": No such file or directory

I'm guessing that this means that I may have get someone to pull out all
versions of a given tuple because I have lost some of the visibility
info. This shouldn't matter as most likely very few tuples would have
had more than one version when the system went down.

I just hope that the relations are need are in tact and that there is
someone out there who can help me get it out.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ron Johnson 2006-10-19 05:56:43 Re: hardware failure - data recovery
Previous Message Ron Johnson 2006-10-19 05:13:30 Re: Is it possible to port from Postgres to Versant