Re: Database Recovery Procedures

From: Network Administrator <netadmin(at)vcsn(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Database Recovery Procedures
Date: 2003-09-17 19:17:20
Message-ID: 1063826240.3f68b340423a6@webmail.vcsn.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Quoting Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:

> Network Administrator <netadmin(at)vcsn(dot)com> writes:
> >>> PANIC: read of clog file 5, offset 16384 failed: Success
>
> >> Hm, not good :-(. What files actually exist in $PGDATA/pg_clog/ (names
> >> and sizes)?
>
> > -rw------- 1 postgres users 262144 Jul 20 15:53 0000
> > -rw------- 1 postgres users 262144 Jul 31 12:57 0001
> > -rw------- 1 postgres users 262144 Aug 12 17:32 0002
> > -rw------- 1 postgres users 262144 Aug 26 00:15 0003
> > -rw------- 1 postgres users 262144 Sep 9 23:44 0004
> > -rw------- 1 postgres users 16384 Sep 10 21:21 0005
>
> Okay, it's trying to read off the end of the clog, no doubt looking for
> a transaction number just slightly larger than what's known to clog.
> This probably indicates more serious problems (because WAL replay really
> should have prevented such an inconsistency), but you can get past the
> immediate panic relatively easily: just append an 8k page of zeroes to
> clog. Assuming your system has /dev/zero, something like this should
> do it:
>
> dd bs=8k count=1 < /dev/zero >> $PGDATA/pg_clog/0005
>
> (do this with care of course, and you should probably shut down the
> postmaster first). You might possibly have to add more than one page,
> if you then get similar PANICs with larger offsets, but try one page
> for starters.

Well whatdoyaknow! That did it- EVERYTHING is there! I only needed the one
page. I'm going to have to read up on pg_clog (WAL) so that I understand what
it does a little better.

It makes total sense too because obvious if there is no more space to write too
anything on disk get "frozen" where as application is just going to keep moving
along. At least for a time.

> If this does suppress the failure messages, you are still not really out
> of the woods; you should do what you can to check for data consistency.
> A paranoid person would probably take a complete pg_dump and try to diff
> it against the last known good dump. At the very least, I'd treat the
> table involved in the problem with great suspicion.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings

Well, I fortunately was on the side of the road and not in the woods in this
case. I'm assuming the next thing to do is run a vacuum analyse and they a dump
all. I'll see how it things perform over the next 48 hours or so.

Now that we've done that, how should should this surgery be documented? I would
think a "tip" like this should be somewhere in the docs (not just the archive).
I'd be more than will to write up this case but I'm trying to establish a long
goal here- if I may be so bold as to suggest one.

Thanks again.

--
Keith C. Perry
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com

____________________________________
This email account is being host by:
VCSN, Inc : http://vcsn.com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Dmitry Tkach 2003-09-17 19:32:59 Re: Weird query plan
Previous Message CoL 2003-09-17 18:59:36 Re: psql and blob