Re: what could cause this PANIC on enterprise 7.3.4 db?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andriy Tkachuk <ant(at)imt(dot)com(dot)ua>
Cc: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>, Alexey Yahno <yahno(at)imt(dot)com(dot)ua>
Subject: Re: what could cause this PANIC on enterprise 7.3.4 db?
Date: 2003-11-10 14:19:24
Message-ID: 18379.1068473964@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andriy Tkachuk <ant(at)imt(dot)com(dot)ua> writes:
> On Fri, 7 Nov 2003, Tom Lane wrote:
>> Andriy Tkachuk <ant(at)imt(dot)com(dot)ua> writes:
>>> Nov 5 20:22:42 monstr postgres[16071]: [3] PANIC: open of /usr/local/pgsql/data/pg_clog/0040 failed: No such file or directory
>>
>> Could we see ls -l /usr/local/pgsql/data/pg_clog/

> [10:49]/2:ant(at)monstr:~>sudo ls -al /usr/local/pgsql/data/pg_clog
> total 40
> drwx------ 2 pgsql postgres 4096 Nov 7 03:28 .
> drwx------ 6 pgsql root 4096 Oct 23 10:45 ..
> -rw------- 1 pgsql postgres 32768 Nov 10 10:47 000D

Okay, given that the file the code was trying to access is nowhere near
the current or past set of valid transaction numbers, it's pretty clear
that what you have is a corrupted transaction number in some tuple's
header. The odds are that not only the transaction number is affected;
usually when we see something like this, anywhere from dozens to
hundreds of bytes have been replaced by garbage data.

In the cases I've been able to study in the past, the cause seemed to
be faulty hardware or possibly kernel bugs --- for instance someone
recently reported a case where a whole kilobyte of a Postgres file had
been replaced with what seemed to be part of a mail message. I'd
ascribe that to either a disk drive writing a sector at the wrong place,
or the kernel getting confused about which buffer held which file.
So I'd recommend running some hardware diagnostics and checking to see
if there are errata available for your kernel.

As far as cleaning up the immediate damage is concerned, you'll probably
want to use pg_filedump or some such tool to get a better feeling for
the extent of the damage. There are descriptions of this process in the
archives --- try looking for recent references to pg_filedump.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2003-11-10 14:23:37 Re: Experimental patch for inter-page delay in VACUUM
Previous Message Jan Wieck 2003-11-10 14:18:31 Re: Experimental patch for inter-page delay in VACUUM