Re: URGENT: Database keeps crashing - suspect damaged RAM

From: "Markus Wollny" <Markus(dot)Wollny(at)computec(dot)de>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: URGENT: Database keeps crashing - suspect damaged RAM
Date: 2002-08-06 18:02:11
Message-ID: 2266D0630E43BB4290742247C8910575014CE342@dozer.computec.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi!

-----Ursprüngliche Nachricht-----
Von: Tom Lane
Gesendet: Di 06.08.2002 18:59
An: Markus Wollny
Cc: pgsql-general(at)postgresql(dot)org
Betreff: Re: [GENERAL] URGENT: Database keeps crashing - suspect
damaged RAM

"Markus Wollny" <Markus(dot)Wollny(at)computec(dot)de> writes:
> So: Is it bad RAM? How can I make sure? What else could it be?

Have you tried running memtest86? I've never used that myself
but
some folks on the list say it works well.

No, I haven't tried that yet, but I'm surely going to do so tomorrow.

> Here's a small excerpt from the logfile:

> 2002-08-06 17:36:23 [17296] DEBUG: _mdfd_blind_getseg:
couldn't open
> /var/lib/pgsql/data/base/base/16596/16671: Cannot allocate
memory

Is it possible that you are running with inadequate swap space,
a small
data segment limit (ulimit -d), or something else that would
make the
kernel refuse to give memory to a backend process?

I shouldn't think so; the machine has 2 GB RAM (that was more than
sufficient for the same DB, applications and load on a different
machine) and 4 GB swap:
Disk geometry for /dev/sda: 0.000-51834.000 megabytes
Disk label type: msdos
Minor Start End Type Filesystem Flags
1 0.031 15.688 primary ext3 boot
2 15.688 4118.225 primary linux-swap
3 4118.225 24599.531 primary ext3
4 24599.531 51826.882 primary ext3

Taking a closer look I am a bit confused: I allocated 4GB the swap
partition, as you can see above, but free only reports 2GB? That's
strange, but cannot be the cause, I think, as the working machine has
got just 2 GB swap, too. ulimit is set to "unlimited" and there was RAM
available during load. As a matter of fact, right now free reports:

total used free shared buffers
cached
Mem: 2061536 2053816 7720 0 4496
1825620
-/+ buffers/cache: 223700 1837836
Swap: 2097136 124800 1972336

on our fallback-machine, and that's the very same database and very same
application, it is running. When taking a look at total disk usage of
the database, I get a total of 1,8 GB. When I switched to the new
machine, there were about 30-50 open connections, max. connections is
set to 512 on both machines. The crashes occurred immediately after
making the DB accessible to our application, so most of the DB was
definitely not yet in memory. And again - our fallback-machine which has
got no RAID and slower processors can handle the very same DB under the
very same load with no such problems - I never ever encountered this
"cannot allocate memory" error before.

> 2002-08-06 17:40:53 [16530] DEBUG: connection startup failed
(fork
> failure): Cannot allocate memory
> 2002-08-06 17:52:50 [16530] DEBUG: connection startup failed
(fork
> failure): Cannot allocate memory

Still looks like inadequate memory --- but now I'm thinking that
it's a
system-wide condition, ie, you just plain haven't got enough RAM
for the
number of processes you're trying to start.

> 2002-08-06 17:52:54 [16530] DEBUG: server process (pid
18237) was
> terminated by signal 9

Postgres never issues any kill -9 on itself, but I've heard that
the
Linux kernel may start killing processes when it's desperately
low on
memory.

Other than the signal 9, everything I see in this trace is
either a
cannot-allocate-memory failure or followup effects from one.
How many
backends are you trying to start up, anyway? Might you have a
runaway
client that keeps opening new backend connections?

Must be something else - the number of connections was not at all high
(<100), the server-load wasn't more than 3.5 (on a 4-processor machine),
there was RAM available at the time, both physical and swap, I haven't
got any surplus daemons running... I think I'll be able to harden the
bad-RAM-issue tomorrow using memtest86.

Thank you!

Regards,

Markus

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jeff Davis 2002-08-06 18:29:18 Re: URGENT: Database keeps crashing - suspect damaged RAM
Previous Message Tom Lane 2002-08-06 16:59:07 Re: URGENT: Database keeps crashing - suspect damaged RAM