Re: postmaster fails to start

From: "Dweck Nir" <Nir(dot)Dweck(at)tadirantele(dot)com>
To: "Richard Huxton" <dev(at)archonet(dot)com>
Cc: "postgreSQL mailing list (E-mail)" <pgsql-general(at)postgresql(dot)org>
Subject: Re: postmaster fails to start
Date: 2005-05-25 09:19:48
Message-ID: 68382F2B929CEB4FAE828088C1BF0672139510@tbs-ex1.tadirantele.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

hi,
1) when the postmaster was started the first time, it was just a matter of .pid file not being erased, since the machine was restarted. There was no other postmaster running.
2) all the WAL configurations are as default:
#---------------------------------------------------------------------------
# WRITE AHEAD LOG
#---------------------------------------------------------------------------

# - Settings -

#fsync = true # turns forced synchronization on or off
#wal_sync_method = fsync # the default varies across platforms:
# fsync, fdatasync, open_sync, or open_datasync
#wal_buffers = 8 # min 4, 8KB each
#commit_delay = 0 # range 0-100000, in microseconds
#commit_siblings = 5 # range 1-1000

# - Checkpoints -

#checkpoint_segments = 3 # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 300 # range 30-3600, in seconds
#checkpoint_warning = 30 # 0 is off, in seconds

# - Archiving -

#archive_command = '' # command to use to archive a logfile segment

3) I have he data backed up in other databases (not as a file backup), so I am really not so concerned about loosing the data (in this specific case). The problem is that the postmaster isn't starting so I can't even restore the data. Most importantly I would like to learn from this case what to do next time this problem happens to me in the field.

Regards,
Nir.

-----Original Message-----
From: Richard Huxton [mailto:dev(at)archonet(dot)com]
Sent: Wednesday, May 25, 2005 11:51 AM
To: Dweck Nir
Cc: postgreSQL mailing list (E-mail)
Subject: Re: [GENERAL] postmaster fails to start

I've taken the liberty of rearranging your email slightly.

Dweck Nir wrote:
> The sequence of events was as follow: 1) computer was shut down
> without stopping postmaster.

OK - not good. Some crucial questions:
1. Do you have fsync enabled or disabled in the postgresql.conf file?
2. Do you know whether your drives are flushing write-cache properly?

> 2) postmaster was started, but because of an error that there might
> be another postmaster running, the postmaster was started again.

Was this just a matter of deleting the .pid file and did you check there
wasn't another postmaster running?

> 3) since then each time I try to start the postmaster I get the same
> error.

> LOG: redo starts at 1/A500075C PANIC: btree_delete_page_redo: lost
> target page LOG: startup process (PID 4409) was terminated by signal
> 6

OK - well, this error message is in backend/access/nbtree/nbtxlog.c
where it is replaying the write-ahead-log files for btrees (I'm no
hacker, I just searched the source for the error message and read the
comments).

So - it looks like you might have a corrupted WAL. That shouldn't be
possible if you were running with fsync enabled and drives that flushed
cache like they should, so I'm guessing that wasn't the case.

It might be possible to recover to a state before this point, but that's
not something I'm going to be able to advise on. There are two steps you
should take immediately though.

1. Take a file-backup of your entire data directory and keep it safe.
You might well be making repeated attempts to recover this.
2. Check your most recent database backup and restore it to another
machine - it may be quicker to restore that than fix your file corruption.

--
Richard Huxton
Archonet Ltd

Browse pgsql-general by date

  From Date Subject
Next Message Sebastian Böck 2005-05-25 09:35:40 Re: Update on tables when the row doesn't change
Previous Message Richard Huxton 2005-05-25 09:08:21 Re: More detailed error logging?