Re: The database system is in recovery mode

From: Andrew Sullivan <andrew(at)libertyrms(dot)info>
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: The database system is in recovery mode
Date: 2003-05-02 14:14:44
Message-ID: 20030502141444.GC13419@libertyrms.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Thu, May 01, 2003 at 06:24:03PM -0400, Trevor Astrope wrote:
> Could this be the linux kernel randomly killing processes under heavy
> load issue?

Not from the look of things. See below.

> System is postgresql 7.2.1 on redhat 7.2. Here's the logs:

You should really upgrade at least to 7.2.4 (no dump required).
7.2.1 has some nasty bugs.

> 2003-05-01 16:54:08 DEBUG: server process (pid 2599) was
> terminated by signal 11
^^

That's not signal 9, so it's not the kernel. Sig 11 is SIGSEV on
Linux, which probably means some sort of memory problem. Are you
suing ECC RAM for your database? You should. In any case, the first
thing I'd do is run memtest86 on it.

> 2003-05-01 16:54:08 DEBUG: terminating any other active server processes
> 2003-05-01 16:54:08 NOTICE: Message from PostgreSQL backend:
> The Postmaster has informed me that some other backend
> died abnormally and possibly corrupted shared memory.
> I have rolled back the current transaction and am
> going to terminate your database system connection and exit.
> Please reconnect to the database system and repeat your query.
>
> After a bunch of these, the database goes in recovery mode:

That's what it's supposed to do. It's what WAL buys you.

> I presume this is rerunning the WAL? Is the message serious...could there
> be database corruption or just lost transactions?

Neither, assuming you have good hardware and you're using fsync. WAL
is there precisely to make the system crash safe. (Of course, if
it's sitting on an ext2 partition and the system goes down hard, you
have a different batch of problems. But WAL+fsync protects you from
postmaster crashes, and machine crashes if your filesystem is
crash-safe.)

A

--
----
Andrew Sullivan 204-4141 Yonge Street
Liberty RMS Toronto, Ontario Canada
<andrew(at)libertyrms(dot)info> M2P 2A8
+1 416 646 3304 x110

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2003-05-02 14:16:22 Re: problem after an hd failure
Previous Message JEANARTHUR 2003-05-02 09:59:56 problem after an hd failure