Re: URGENT: Database keeps crashing - suspect damaged RAM

From: "Markus Wollny" <Markus(dot)Wollny(at)computec(dot)de>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Re: URGENT: Database keeps crashing - suspect damaged RAM
Date: 2002-08-07 15:34:51
Message-ID: 2266D0630E43BB4290742247C891057501B13229@dozer.computec.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi!

I think I'll have to bow down to you gurus - again :) I upgraded to
2.4.16 (there are no RPMs for 2.4.19 and I didn't want to compile from
source - yet), and the symptoms have disappeared altogether. Which is
strange because, as I already told, the very same config isn't giving me
any trouble on a different machine... Anyway: I'll shun 2.4.10 from now
on.

Regards,

Markus

> -----Ursprüngliche Nachricht-----
> Von: Jeff Davis [mailto:list-pgsql-general(at)empires(dot)org]
> Gesendet: Dienstag, 6. August 2002 20:29
> An: Markus Wollny; Tom Lane
> Cc: pgsql-general(at)postgresql(dot)org
> Betreff: Re: [GENERAL] URGENT: Database keeps crashing -
> suspect damaged
> RAM
>
>
> Virtual memory problems on linux have certainly happened
> before; perhaps your
> running a kernel that had some major ones. Maybe if you
> upgraded to 2.4.19?
>
> Regards,
> Jeff Davis
>
> On Tuesday 06 August 2002 11:02 am, Markus Wollny wrote:
> > Hi!
> >
> > -----Ursprüngliche Nachricht-----
> > Von: Tom Lane
> > Gesendet: Di 06.08.2002 18:59
> > An: Markus Wollny
> > Cc: pgsql-general(at)postgresql(dot)org
> > Betreff: Re: [GENERAL] URGENT: Database keeps crashing - suspect
> > damaged RAM
> >
> >
> >
> > "Markus Wollny" <Markus(dot)Wollny(at)computec(dot)de> writes:
> >
> > > So: Is it bad RAM? How can I make sure? What else could it be?
> >
> >
> > Have you tried running memtest86? I've never used that myself
> > but
> > some folks on the list say it works well.
> >
> >
> >
> > No, I haven't tried that yet, but I'm surely going to do so
> tomorrow.
> >
> >
> > > Here's a small excerpt from the logfile:
> >
> >
> >
> > > 2002-08-06 17:36:23 [17296] DEBUG: _mdfd_blind_getseg:
> >
> > couldn't open
> >
> > > /var/lib/pgsql/data/base/base/16596/16671: Cannot allocate
> >
> > memory
> >
> > Is it possible that you are running with inadequate swap space,
> > a small
> > data segment limit (ulimit -d), or something else that would
> > make the
> > kernel refuse to give memory to a backend process?
> >
> > I shouldn't think so; the machine has 2 GB RAM (that was more than
> > sufficient for the same DB, applications and load on a different
> > machine) and 4 GB swap:
> > Disk geometry for /dev/sda: 0.000-51834.000 megabytes
> > Disk label type: msdos
> > Minor Start End Type Filesystem Flags
> > 1 0.031 15.688 primary ext3 boot
> > 2 15.688 4118.225 primary linux-swap
> > 3 4118.225 24599.531 primary ext3
> > 4 24599.531 51826.882 primary ext3
> >
> > Taking a closer look I am a bit confused: I allocated 4GB the swap
> > partition, as you can see above, but free only reports 2GB? That's
> > strange, but cannot be the cause, I think, as the working
> machine has
> > got just 2 GB swap, too. ulimit is set to "unlimited" and
> there was RAM
> > available during load. As a matter of fact, right now free reports:
> >
> > total used free shared buffers
> > cached
> > Mem: 2061536 2053816 7720 0 4496
> > 1825620
> > -/+ buffers/cache: 223700 1837836
> > Swap: 2097136 124800 1972336
> >
> > on our fallback-machine, and that's the very same database
> and very same
> > application, it is running. When taking a look at total
> disk usage of
> > the database, I get a total of 1,8 GB. When I switched to the new
> > machine, there were about 30-50 open connections, max.
> connections is
> > set to 512 on both machines. The crashes occurred immediately after
> > making the DB accessible to our application, so most of the DB was
> > definitely not yet in memory. And again - our
> fallback-machine which has
> > got no RAID and slower processors can handle the very same
> DB under the
> > very same load with no such problems - I never ever encountered this
> > "cannot allocate memory" error before.
> >
> >
> > > 2002-08-06 17:40:53 [16530] DEBUG: connection startup failed
> >
> > (fork
> >
> > > failure): Cannot allocate memory
> > > 2002-08-06 17:52:50 [16530] DEBUG: connection startup failed
> >
> > (fork
> >
> > > failure): Cannot allocate memory
> >
> >
> > Still looks like inadequate memory --- but now I'm thinking that
> > it's a
> > system-wide condition, ie, you just plain haven't got enough RAM
> > for the
> > number of processes you're trying to start.
> >
> >
> > > 2002-08-06 17:52:54 [16530] DEBUG: server process (pid
> >
> > 18237) was
> >
> > > terminated by signal 9
> >
> >
> > Postgres never issues any kill -9 on itself, but I've heard that
> > the
> > Linux kernel may start killing processes when it's desperately
> > low on
> > memory.
> >
> > Other than the signal 9, everything I see in this trace is
> > either a
> > cannot-allocate-memory failure or followup effects from one.
> > How many
> > backends are you trying to start up, anyway? Might you have a
> > runaway
> > client that keeps opening new backend connections?
> >
> >
> > Must be something else - the number of connections was not
> at all high
> > (<100), the server-load wasn't more than 3.5 (on a
> 4-processor machine),
> > there was RAM available at the time, both physical and
> swap, I haven't
> > got any surplus daemons running... I think I'll be able to
> harden the
> > bad-RAM-issue tomorrow using memtest86.
> >
> > Thank you!
> >
> > Regards,
> >
> > Markus
> >
> >
> > ---------------------------(end of
> broadcast)---------------------------
> > TIP 2: you can get off all lists at once with the unregister command
> > (send "unregister YourEmailAddressHere" to
> majordomo(at)postgresql(dot)org)
>
>

Browse pgsql-general by date

  From Date Subject
Next Message Brett Schwarz 2002-08-07 15:40:22 Re: [Fwd: Tcl Interface modifications (Was: Re: database
Previous Message Robert Treat 2002-08-07 15:33:12 Re: MySQL or Postgres ?