Quick Links

Re: Hosed PostGreSQL Installation

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Pete St(dot) Onge" <pete(at)seul(dot)org>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hosed PostGreSQL Installation
Date:	2002-09-21 15:13:44
Message-ID:	8533.1032621224@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

"Pete St. Onge" <pete(at)seul(dot)org> writes:
> As a result of some disk errors on another drive, an admin in our group
> brought down the server hosting our pgsql databases with a kill -KILL
> after having gone to runlevel 1 and finding the postmaster process still
> running. No surprise, our installation was hosed in the process.

That should not have been a catastrophic mistake in any version >= 7.1.
I suspect you had disk problems or other problems.

> Klamath suggested that I run pg_controldata:

> ...
> Latest checkpoint's StartUpID: 21
> Latest checkpoint's NextXID: 615
> Latest checkpoint's NextOID: 18720

These numbers are suspiciously small for an installation that's been
in production awhile. I suspect you have not told us the whole story;
in particular I suspect you already tried "pg_resetxlog -f", which was
probably not a good idea.

> If I look into the pg_xlog directory, I see this:

> -rw------- 1 postgres postgres 16777216 Sep 20 23:13 0000000000000002
> -rw------- 1 postgres postgres 16777216 Sep 19 22:09 000000020000007E

Yeah, your xlog positions should be a great deal higher than they are,
if segment 2/7E was previously in use.

It is likely that you can recover (with some uncertainty about integrity
of recent transactions) if you proceed as follows:

1. Get contrib/pg_resetxlog/pg_resetxlog.c from the 7.2.2 release (you
can't use 7.1's pg_resetxlog because it doesn't offer the switches
you'll need). Compile it *against your 7.1 headers*. It should compile
except you'll have to remove this change:

***************
*** 853,858 ****
--- 394,403 ----
page->xlp_magic = XLOG_PAGE_MAGIC;
page->xlp_info = 0;
page->xlp_sui = ControlFile.checkPointCopy.ThisStartUpID;
+ page->xlp_pageaddr.xlogid =
+ ControlFile.checkPointCopy.redo.xlogid;
+ page->xlp_pageaddr.xrecoff =
+ ControlFile.checkPointCopy.redo.xrecoff - SizeOfXLogPHD;
record = (XLogRecord *) ((char *) page + SizeOfXLogPHD);
record->xl_prev.xlogid = 0;
record->xl_prev.xrecoff = 0;

Test it using its -n switch to make sure it reports sane values.

2. Run the hacked-up pg_resetxlog like this:

pg_resetxlog -l 2 127 -x 1000000000 $PGDATA

(the -l position is next beyond what we see in pg_xlog, the 1-billion
XID is just a guess at something past where you were. Actually, can
you give us the size of pg_log, ie, $PGDATA/global/1269? That would
allow computing a correct next-XID to use. Figure 4 XIDs per byte,
thus if pg_log is 1 million bytes you need -x at least 4 million.)

3. The postmaster should start now.

4. *Immediately* attempt to do a pg_dumpall. Do not pass GO, do not
collect $200, do not let in any interactive clients until you've done
it. (I'd suggest tweaking pg_hba.conf to disable all logins but your
own.)

5. If pg_dumpall succeeds and produces sane-looking output, then you've
survived. initdb, reload the dump file, re-open for business, go have
a beer. (Recommended: install 7.2.2 and reload into that, not 7.1.*.)
You will probably still need to check for partially-applied recent
transactions, but for the most part you should be OK.

6. If pg_dumpall fails then let us know what the symptoms are, and we'll
see if we can figure out a workaround for whatever the corruption is.

regards, tom lane

In response to

Hosed PostGreSQL Installation at 2002-09-21 05:54:55 from Pete St. Onge

Responses

Re: Hosed PostGreSQL Installation at 2002-09-24 04:29:36 from Pete St. Onge

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2002-09-21 16:01:28	NUMERIC's transcendental functions
Previous Message	Shridhar Daithankar	2002-09-21 08:44:26	Re: Improving speed of copy