Re: URGENT: Database keeps crashing - suspect damaged RAM

From: Jeff Davis <list-pgsql-general(at)empires(dot)org>
To: "Markus Wollny" <Markus(dot)Wollny(at)computec(dot)de>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: URGENT: Database keeps crashing - suspect damaged RAM
Date: 2002-08-06 18:29:18
Message-ID: 200208061129.18868.list-pgsql-general@empires.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Virtual memory problems on linux have certainly happened before; perhaps your
running a kernel that had some major ones. Maybe if you upgraded to 2.4.19?

Regards,
Jeff Davis

On Tuesday 06 August 2002 11:02 am, Markus Wollny wrote:
> Hi!
>
> -----Ursprüngliche Nachricht-----
> Von: Tom Lane
> Gesendet: Di 06.08.2002 18:59
> An: Markus Wollny
> Cc: pgsql-general(at)postgresql(dot)org
> Betreff: Re: [GENERAL] URGENT: Database keeps crashing - suspect
> damaged RAM
>
>
>
> "Markus Wollny" <Markus(dot)Wollny(at)computec(dot)de> writes:
>
> > So: Is it bad RAM? How can I make sure? What else could it be?
>
>
> Have you tried running memtest86? I've never used that myself
> but
> some folks on the list say it works well.
>
>
>
> No, I haven't tried that yet, but I'm surely going to do so tomorrow.
>
>
> > Here's a small excerpt from the logfile:
>
>
>
> > 2002-08-06 17:36:23 [17296] DEBUG: _mdfd_blind_getseg:
>
> couldn't open
>
> > /var/lib/pgsql/data/base/base/16596/16671: Cannot allocate
>
> memory
>
> Is it possible that you are running with inadequate swap space,
> a small
> data segment limit (ulimit -d), or something else that would
> make the
> kernel refuse to give memory to a backend process?
>
> I shouldn't think so; the machine has 2 GB RAM (that was more than
> sufficient for the same DB, applications and load on a different
> machine) and 4 GB swap:
> Disk geometry for /dev/sda: 0.000-51834.000 megabytes
> Disk label type: msdos
> Minor Start End Type Filesystem Flags
> 1 0.031 15.688 primary ext3 boot
> 2 15.688 4118.225 primary linux-swap
> 3 4118.225 24599.531 primary ext3
> 4 24599.531 51826.882 primary ext3
>
> Taking a closer look I am a bit confused: I allocated 4GB the swap
> partition, as you can see above, but free only reports 2GB? That's
> strange, but cannot be the cause, I think, as the working machine has
> got just 2 GB swap, too. ulimit is set to "unlimited" and there was RAM
> available during load. As a matter of fact, right now free reports:
>
> total used free shared buffers
> cached
> Mem: 2061536 2053816 7720 0 4496
> 1825620
> -/+ buffers/cache: 223700 1837836
> Swap: 2097136 124800 1972336
>
> on our fallback-machine, and that's the very same database and very same
> application, it is running. When taking a look at total disk usage of
> the database, I get a total of 1,8 GB. When I switched to the new
> machine, there were about 30-50 open connections, max. connections is
> set to 512 on both machines. The crashes occurred immediately after
> making the DB accessible to our application, so most of the DB was
> definitely not yet in memory. And again - our fallback-machine which has
> got no RAID and slower processors can handle the very same DB under the
> very same load with no such problems - I never ever encountered this
> "cannot allocate memory" error before.
>
>
> > 2002-08-06 17:40:53 [16530] DEBUG: connection startup failed
>
> (fork
>
> > failure): Cannot allocate memory
> > 2002-08-06 17:52:50 [16530] DEBUG: connection startup failed
>
> (fork
>
> > failure): Cannot allocate memory
>
>
> Still looks like inadequate memory --- but now I'm thinking that
> it's a
> system-wide condition, ie, you just plain haven't got enough RAM
> for the
> number of processes you're trying to start.
>
>
> > 2002-08-06 17:52:54 [16530] DEBUG: server process (pid
>
> 18237) was
>
> > terminated by signal 9
>
>
> Postgres never issues any kill -9 on itself, but I've heard that
> the
> Linux kernel may start killing processes when it's desperately
> low on
> memory.
>
> Other than the signal 9, everything I see in this trace is
> either a
> cannot-allocate-memory failure or followup effects from one.
> How many
> backends are you trying to start up, anyway? Might you have a
> runaway
> client that keeps opening new backend connections?
>
>
> Must be something else - the number of connections was not at all high
> (<100), the server-load wasn't more than 3.5 (on a 4-processor machine),
> there was RAM available at the time, both physical and swap, I haven't
> got any surplus daemons running... I think I'll be able to harden the
> bad-RAM-issue tomorrow using memtest86.
>
> Thank you!
>
> Regards,
>
> Markus
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)

In response to

Browse pgsql-general by date

  From Date Subject
Next Message scott.marlowe 2002-08-06 18:33:17 Re: Cannot use more than 16 attributes in an index
Previous Message Markus Wollny 2002-08-06 18:02:11 Re: URGENT: Database keeps crashing - suspect damaged RAM