From: | Joe Conway <mail(at)joeconway(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>, Michael Fuhr <mike(at)fuhr(dot)org>, "Hackers (PostgreSQL)" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: production server down |
Date: | 2004-12-27 18:08:36 |
Message-ID: | 41D04FA4.7010402@joeconway.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tom Lane wrote:
> Are you using one of the scripts that
> does an auto initdb if it doesn't see a valid PGDATA? 11 seconds might
> be about right for that.
>
> One problem with this theory is how come you didn't get screwed during
> *that* boot cycle. It seems to require assuming that the NFS mount came
> online just after the initdb finished (else initdb would have
> overwritten the on-NFS pg_control) but before the regular postmaster
> started (else this same scenario would have played out then). That's
> not a very wide window.
[followup]
We've now had a chance to bring Postgres down and check under the mount
point. There *is* indeed a newly initdb'd cluster under there. FWIW the
control file is corrupt:
# pg_controldata /home/jconway/pgsql/fds/replica/pgdata
WARNING: Calculated CRC checksum does not match value stored in file.
Either the file is corrupt, or it has a different layout than this program
is expecting. The results below are untrustworthy.
pg_control version number: 72
Catalog version number: 200310211
Database cluster state: in production
pg_control last modified: Sat Feb 6 22:28:16 2106
Current log file ID: 0
Next log file segment: 10161036
Latest checkpoint location: 0/9AA1B4
Prior checkpoint location: 0/9B0B8C
Latest checkpoint's REDO location: 0/0
Latest checkpoint's UNDO location: C/218
Latest checkpoint's StartUpID: 17142
Latest checkpoint's NextXID: 1099443932
Latest checkpoint's NextOID: 8192
Time of latest checkpoint: Wed Apr 8 07:05:36 6325
Database block size: 1
Blocks per segment of large relation: 128
Maximum length of identifiers: 67
Maximum number of function arguments: 0
Date/time type storage: floating-point numbers
Maximum length of locale name: 0
LC_COLLATE:
LC_CTYPE:
I have a tarred copy of the under-the-mount PGDATA if anyone is
interested in examining it.
BTW, there was another Postgres cluster on this same server which we had
not used since the November 2 reboot -- it was corrupt in pretty much
the same way and also had an initdb'd cluster under its mount.
So it looks like using an auto initdb startup script is a very bad idea
when using an NFS mounted PGDATA. We left the under-mount structure in
place and did "chown root:root" and "chmod 000" on it. And, as mentioned
in an earlier post, we now rely on the dba to start postgres manually
after a server restart.
Joe
From | Date | Subject | |
---|---|---|---|
Next Message | Merlin Moncure | 2004-12-27 18:24:29 | Re: LISTEN/NOTIFY enhancement: Portable signal handling? |
Previous Message | Tom Lane | 2004-12-27 17:41:07 | Re: LISTEN/NOTIFY enhancement: Portable signal handling? |