[Fwd: Re: Question about pgfsck]

From: Dan Hrabarchuk <dan(at)kwasar(dot)biz>
To: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: [Fwd: Re: Question about pgfsck]
Date: 2004-10-07 16:25:24
Message-ID: 1097166324.6719.221.camel@dan.geotivity.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Is there a FAQ or a HOWTO on recovering from database corruption? If
not, then I am willing to write it. I will need help getting the
document together because as much as I google, I come up empty handed.

What I need is something that tells you:

A> You get errors in your server log like:
PANIC: open of /psql/pg_clog/0C95 failed: No such file or directory
- or -
PANIC: ZeroFill failed to write /psql/pg_xlog/xlogtemp.27517: No space
left on device :)
- or -
LOG: server process (pid 48146) was terminated by signal 6
- or -
LOG: ReadRecord: record with zero length at 39D/A53F7800
- or -
LOG: ReadRecord: unexpected pageaddr 39A/78D4000 in log file 922,
segment 15, offset 9256960
- or -
ERROR: Invalid page header in block 2879878 of channeldata

etc.

Which are serious? What needs to be done?

B> How to find all corrupt tuples, whether they are in indexes or the
tables themselves.

C> How to eliminate the corrupt tuples. Delete them of course. But how
when the server errors on deleting them??

pgfsck looks like an awesome tool. It allows you to search areas of your
database that are rarely used. So you don't get a 1AM phone call from
the sales guy needing some report for tomorrow at 7AM and he can't get
it.

I want to have something that can be easily digested and doesn't require
advanced knowledge of postgres. What does the beginning PostgreSQL admin
do? A database disaster guide.

Thanks

Dan Hrabarchuk

-----Forwarded Message-----
> From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
> To: Dan Hrabarchuk <dan(at)kwasar(dot)biz>
> Subject: Re: Question about pgfsck
> Date: Thu, 07 Oct 2004 10:38:33 +0200
>
> The problem is obviously with the structure of the tuples. So you need
> to identify it by something then does not require parsing the tuples.
> Hence the suggestion of ctid or oid or something else fixed.
>
> Good luck,
>
> On Wed, Oct 06, 2004 at 02:57:31PM -0700, Dan Hrabarchuk wrote:
> >
> > > > 3. Finding the file with the error.
> > >
> > > And the tablename, ergo, filename
> > >
> > > > 4. Finally fixing the error.
> > >
> > > Delete the record is the usual way.
> >
> > Okay the following query:
> > SELECT * FROM channeldata where cd_id=6268 and tstamp<???2004-09-20??? and
> > tstamp>???2004-09-15???;
> >
> > Will cause a signal 6 and have the database backend restart.
> >
> > If I change it to: DELETE FROM channeldata where cd_id=6268 and
> > tstamp<???2004-09-20??? and tstamp>???2004-09-15???;
> >
> > I get an error and the transaction does not commit. How do you delete
> > the records? Or is this not the usual way.
> >
> > I can recover the data after it's deleted. I just need to delete it.
> >
> > Thanks again.
> >
> > Dan
> >

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2004-10-07 16:55:55 Re: [Fwd: Re: Question about pgfsck]
Previous Message Lee Wu 2004-10-07 16:08:55 migrating PG cidr type and operators