Re: Version 7.2.3 unrecoverable crash on missing pg_clog

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andy Osborne <andy(at)sift(dot)co(dot)uk>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Version 7.2.3 unrecoverable crash on missing pg_clog
Date: 2003-01-09 15:27:32
Message-ID: 29844.1042126052@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Andy Osborne <andy(at)sift(dot)co(dot)uk> writes:
> Tom Lane wrote:
>>> FATAL 2: open of /u0/pgdata/pg_clog/0726 failed: No such file or directory
>> What range of file names do you actually see in pg_clog?

> Currently 0000 to 00D6. I don't know what it was last night.

Not any greater, for sure. (FYI, each segment covers one million
transactions.)

> the next backup was running when the database crashed. Any
> attempt to access the table crashed it again. I don't know if
> it helps, but a select * from news where <conditional on a field
> with an index) was ok but if the where was not indexed and resulted
> in a table scan, it crashed it.

This is consistent with one page of the table being corrupted.

> While I wouldn't rule out data corruption, the kernel message
> ring has no errors for the md dirver, scsi host adapter or the
> disks, which I would expect if we had bad blocks appearing on a
> disk or somesuch.

Some of the cases that I've seen look like completely unrelated data
(not even Postgres stuff, just bits of text files) was written into
a page of a Postgres table. This could possibly be a kernel bug,
along the lines of getting confused about which buffer belongs to
which file. But with no way to reproduce it it's hard to pin blame.

>> You didn't happen to make a physical copy of the news table before
>> dropping it, did you? It'd be interesting to examine the remains.

> Sadly, no I didn't. This is one of our live database servers
> and I was under a lot of pressure to get it back quickly. If
> it does it again, what can I do to provide the most useful
> feedback ?.

If the database isn't unreasonably large, perhaps you could take a
tarball dump of the whole $PGDATA directory tree while the postmaster
is stopped? That would document the situation for examination at leisure.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andy Osborne 2003-01-09 15:38:01 Re: Version 7.2.3 unrecoverable crash on missing pg_clog
Previous Message Andy Osborne 2003-01-09 15:12:24 Re: Version 7.2.3 unrecoverable crash on missing pg_clog