Re: URGENT: Database keeps crashing - suspect damaged RAM

From: "Markus Wollny" <Markus(dot)Wollny(at)computec(dot)de>
To: "Markus Wollny" <Markus(dot)Wollny(at)computec(dot)de>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: URGENT: Database keeps crashing - suspect damaged RAM
Date: 2002-08-06 16:45:17
Message-ID: 2266D0630E43BB4290742247C891057501B1321C@dozer.computec.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Oh - and I forgot to mention: The crashes only occur when there is load
on the machine. No load - no crashes. But then, that wouldn't be any
surprise, as it wouldn't make use of a lot of RAM without any load...

Regards,

Markus

> -----Ursprüngliche Nachricht-----
> Von: Markus Wollny
> Gesendet: Dienstag, 6. August 2002 18:38
> An: pgsql-general(at)postgresql(dot)org
> Betreff: [GENERAL] URGENT: Database keeps crashing - suspect
> damaged RAM
>
>
> Hello!
>
> I just installed PostgreSQL 7.2.1 on SuSE 7.3, 4xPIIIXEON 550MHz, 2GB
> RAM, 5x18GB SCSI RAID. The OS was freshly installed, after that I
> compiled and installed PostgreSQL from source (./configure
> --prefix=/opt/pgsql/ --with-perl --enable-odbc --enable-locale
> --enable-syslog). I copied the settings in postgresql.conf
> etc. from an
> identical machine running the identical platform. Then I imported a
> database to the new installation. The import seems to be
> successfull, I
> didn't get any errors during import. A subsequent vacuum analyze did
> finish without anything out of the ordinary.
>
> Just a few minutes after this vacuum analyze, the database crashed for
> the first time. It keeps crashing every now and then - every
> one or two
> minutes.
>
> What puzzles me is the fact that this very same machine was running
> Oracle 8i on Win2k more or less flawlessly just up to a few
> hours before
> - more or less meaning that we never really noticed anything
> much out of
> the ordinary. There might have been some minor issues after a
> RAM-upgrade from 1 GB to 2 GB just a week ago, but looking back it's
> hard to say if that could be due to bad RAM or just some bad
> code which
> we've sorted out (or disposed of) by now. As the machine is already
> running Linux and PostgreSQL it's quite impossible to prove
> my suspicion
> by going back to Oracle and having a closer look.
>
> What I'd like to know is if I need to look any further than
> RAM - shall
> I just chuck the new modules out of the machine? Or is there
> some other
> issue that could cause this behaviour? I am quite sure that I
> didn't do
> anything wrong during installation, configuration and import and the
> same application code is running without errors on a different machine
> at this very moment. I don't like the "record with zero length" and
> "Cannot allocate memory"-bits in the logfile at all, let
> alone the "was
> terminated by signal 9"-thingy.
>
> So: Is it bad RAM? How can I make sure? What else could it be?
>
> Here's a small excerpt from the logfile:
>
> 2002-08-06 17:31:38 [17063] DEBUG: Pages 0: Changed 0,
> Empty 0; Tup 0:
> Vac 0, Keep 0, UnUsed 0.
> Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
> 2002-08-06 17:36:23 [17296] DEBUG: _mdfd_blind_getseg: couldn't open
> /var/lib/pgsql/data/base/base/16596/16671: Cannot allocate memory
> 2002-08-06 17:36:24 [17296] FATAL 2: cannot write block 13387 of
> 16596/16671 blind: Cannot allocate memory
> 2002-08-06 17:36:24 [16530] DEBUG: server process (pid 17296) exited
> with exit code 2
> 2002-08-06 17:36:24 [16530] DEBUG: terminating any other
> active server
> processes
> 2002-08-06 17:36:24 [17081] NOTICE: Message from PostgreSQL backend:
> The Postmaster has informed me that some other backend
> died abnormally and possibly corrupted shared memory.
> I have rolled back the current transaction and am
> going to terminate your database system connection and exit.
> [...]
> 2002-08-06 17:36:24 [16530] DEBUG: all server processes terminated;
> reinitializing shared memory and semaphores
> 2002-08-06 17:36:24 [17298] DEBUG: database system was
> interrupted at
> 2002-08-06 17:31:21 CEST
> 2002-08-06 17:36:24 [17298] DEBUG: checkpoint record is at
> 0/325D7C78
> 2002-08-06 17:36:24 [17298] DEBUG: redo record is at
> 0/325D7C78; undo
> record is at 0/0; shutdown FALSE
> 2002-08-06 17:36:24 [17298] DEBUG: next transaction id: 2270; next
> oid: 901292
> 2002-08-06 17:36:24 [17298] DEBUG: database system was not properly
> shut down; automatic recovery in progress
> 2002-08-06 17:36:24 [17298] DEBUG: redo starts at 0/325D7CB8
> 2002-08-06 17:36:25 [17298] DEBUG: ReadRecord: record with
> zero length
> at 0/326E16C4
> 2002-08-06 17:36:25 [17298] DEBUG: redo done at 0/326E16A0
> 2002-08-06 17:36:30 [17298] DEBUG: database system is ready
> 2002-08-06 17:40:53 [16530] DEBUG: connection startup failed (fork
> failure): Cannot allocate memory
> 2002-08-06 17:52:50 [16530] DEBUG: connection startup failed (fork
> failure): Cannot allocate memory
> 2002-08-06 17:52:54 [16530] DEBUG: server process (pid 18237) was
> terminated by signal 9
> 2002-08-06 17:52:54 [16530] DEBUG: terminating any other
> active server
> processes
> 2002-08-06 17:52:54 [18234] NOTICE: Message from PostgreSQL backend:
> The Postmaster has informed me that some other backend
> died abnormally and possibly corrupted shared memory.
> I have rolled back the current transaction and am
> going to terminate your database system connection and exit.
> [...]
> 2002-08-06 17:52:57 [18253] FATAL 1: The database system is in
> recovery mode
> 2002-08-06 17:52:57 [18255] FATAL 1: The database system is in
> recovery mode
> 2002-08-06 17:52:57 [18254] FATAL 1: The database system is in
> recovery mode
> 2002-08-06 17:52:57 [18235] NOTICE: Message from PostgreSQL backend:
> The Postmaster has informed me that some other backend
> died abnormally and possibly corrupted shared memory.
> I have rolled back the current transaction and am
> going to terminate your database system connection and exit.
> Please reconnect to the database system and repeat your query.
> 2002-08-06 17:52:57 [18256] FATAL 1: The database system is in
> recovery mode
> 2002-08-06 17:52:57 [18257] FATAL 1: The database system is in
> recovery mode
> 2002-08-06 17:52:57 [18258] FATAL 1: The database system is in
> recovery mode
> 2002-08-06 17:52:57 [16530] DEBUG: all server processes terminated;
> reinitializing shared memory and semaphores
> 2002-08-06 17:52:57 [18260] FATAL 1: The database system is starting
> up
> 2002-08-06 17:52:57 [18259] DEBUG: database system was
> interrupted at
> 2002-08-06 17:51:38 CEST
> 2002-08-06 17:52:57 [18259] DEBUG: checkpoint record is at
> 0/32991848
> 2002-08-06 17:52:57 [18259] DEBUG: redo record is at
> 0/3297F4D8; undo
> record is at 0/0; shutdown FALSE
> 2002-08-06 17:52:57 [18259] DEBUG: next transaction id: 3704; next
> oid: 909484
> 2002-08-06 17:52:57 [18259] DEBUG: database system was not properly
> shut down; automatic recovery in progress
> 2002-08-06 17:52:57 [18259] DEBUG: redo starts at 0/3297F4D8
> 2002-08-06 17:52:57 [18261] FATAL 1: The database system is starting
> up
> 2002-08-06 17:52:58 [18259] DEBUG: ReadRecord: record with
> zero length
> at 0/32BF0278
> 2002-08-06 17:52:58 [18259] DEBUG: redo done at 0/32BF0254
> 2002-08-06 17:52:59 [18262] FATAL 1: The database system is starting
> up
> 2002-08-06 17:53:00 [18259] DEBUG: database system is ready
> 2002-08-06 17:54:24 [16530] DEBUG: connection startup failed (fork
> failure): Cannot allocate memory
> 2002-08-06 17:54:31 [16530] DEBUG: server process (pid 18283) was
> terminated by signal 9
> 2002-08-06 17:54:31 [16530] DEBUG: terminating any other
> active server
> processes
> 2002-08-06 17:54:31 [18275] NOTICE: Message from PostgreSQL backend:
> The Postmaster has informed me that some other backend
> died abnormally and possibly corrupted shared memory.
> I have rolled back the current transaction and am
> going to terminate your database system connection and exit.
> Please reconnect to the database system and repeat your query.
> [...]
> 2002-08-06 17:54:32 [16530] DEBUG: all server processes terminated;
> reinitializing shared memory and semaphores
> 2002-08-06 17:54:32 [18296] DEBUG: database system was
> interrupted at
> 2002-08-06 17:53:00 CEST
> 2002-08-06 17:54:32 [18296] DEBUG: checkpoint record is at
> 0/32BF0278
> 2002-08-06 17:54:32 [18296] DEBUG: redo record is at
> 0/32BF0278; undo
> record is at 0/0; shutdown TRUE
> 2002-08-06 17:54:32 [18296] DEBUG: next transaction id: 4456; next
> oid: 909484
> 2002-08-06 17:54:32 [18296] DEBUG: database system was not properly
> shut down; automatic recovery in progress
> 2002-08-06 17:54:32 [18296] DEBUG: redo starts at 0/32BF02B8
> 2002-08-06 17:54:32 [18296] DEBUG: ReadRecord: record with
> zero length
> at 0/32F0B3C0
> 2002-08-06 17:54:32 [18296] DEBUG: redo done at 0/32F0B39C
> 2002-08-06 17:54:34 [18297] FATAL 1: The database system is starting
> up
> 2002-08-06 17:54:34 [18298] FATAL 1: The database system is starting
> up
> 2002-08-06 17:54:34 [18299] FATAL 1: The database system is starting
> up
> 2002-08-06 17:54:34 [18300] FATAL 1: The database system is starting
> up
> 2002-08-06 17:54:34 [18296] DEBUG: database system is ready
> 2002-08-06 17:57:35 [16530] DEBUG: connection startup failed (fork
> failure): Cannot allocate memory
> 2002-08-06 17:57:54 [16530] DEBUG: server process (pid 18366) was
> terminated by signal 9
> 2002-08-06 17:57:54 [16530] DEBUG: terminating any other
> active server
> processes
> 2002-08-06 17:57:54 [18368] NOTICE: Message from PostgreSQL backend:
> The Postmaster has informed me that some other backend
> died abnormally and possibly corrupted shared memory.
> I have rolled back the current transaction and am
> going to terminate your database system connection and exit.
> Please reconnect to the database system and repeat your query.
> 2002-08-06 17:57:56 [18409] DEBUG: ReadRecord: record with
> zero length
> at 0/3338749C
> 2002-08-06 17:57:58 [18425] FATAL 1: The database system is starting
> up
> 2002-08-06 17:57:58 [18409] DEBUG: database system is ready
> 2002-08-06 17:58:53 [18432] NOTICE: RelationBuildDesc: can't open
> idx_bm_user_id: Cannot allocate memory
> 2002-08-06 17:59:00 [18443] FATAL 1: cannot open
> pg_attribute: Cannot
> allocate memory
> 2002-08-06 17:59:01 [16530] DEBUG: connection startup failed (fork
> failure): Cannot allocate memory
> 2002-08-06 17:59:01 [16530] DEBUG: server process (pid 18436) was
> terminated by signal 9
> 2002-08-06 17:59:01 [16530] DEBUG: terminating any other
> active server
> processes
> 2002-08-06 17:59:03 [18510] DEBUG: ReadRecord: record with
> zero length
> at 0/336E9970
> 2002-08-06 18:00:15 [16530] DEBUG: connection startup failed (fork
> failure): Cannot allocate memory
> 2002-08-06 18:00:17 [18589] DEBUG: ReadRecord: record with
> zero length
> at 0/33A7C194
>
> Thank you for your kind assistance!
>
> Regards,
>
> Markus Wollny
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>

Browse pgsql-general by date

  From Date Subject
Next Message Neil Conway 2002-08-06 16:50:10 Re: URGENT: Database keeps crashing - suspect damaged RAM
Previous Message Markus Wollny 2002-08-06 16:38:24 URGENT: Database keeps crashing - suspect damaged RAM