Re: Backends dying due to memory exhaustion--I'm stonkered

From: Doug McNaught <doug(at)wireboard(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Backends dying due to memory exhaustion--I'm stonkered
Date: 2001-01-27 00:35:09
Message-ID: m3g0i5sugy.fsf@belphigor.mcnaught.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Doug McNaught <doug(at)wireboard(dot)com> writes:
> > The problem I'm having is that the backends will crash randomly, after
> > the database has been up for a few days, with:
> > FATAL 1: Memory exhausted in AllocSetAlloc()
>
> > The system has plenty of memory and swap, and under normal
> > circumstances the backends take up 10-15 megabytes. If it's a
> > runaway situation of some kind, it happens very fast, as I've even
> > taken snapshots of the process table at 1 minute intervals, and they
> > show no abnormality right up to the time of the crash.
>
> Hmm. That puts a damper on the idea that it's a memory leak --- doesn't
> eliminate the theory entirely, however. The other likely theory is that
> you've got a variable-size column value someplace whose size word has
> been corrupted, so that it claims to be umpteen megabytes long. Any
> attempt to copy such a value out of the tuple it's in will result in
> an instant "out of memory" complaint.

Hmm, very interesting. Does VARCHAR count as a variable-size column?
One funny thing is that the nightly VACUUM doesn't always fail--the
system will run smoothly for one to three days on average before a
crash.

> Is there any consistency about which table is being touched when the
> failure occurs? It's not hard to isolate and delete a damaged tuple
> once you know which table it's in, but if you've got a lot of tables
> the initial search can be tedious.

I'll check into this. Having just looked over my error logs, I see
some suspects but nothing jumps out at me. Unfortunately, OpenACS has
a boatload of tables, and there are 8 different instances, each with
its own database.

> One way to get more info is to tweak the code to abort() just before
> it would normally report the out-of-memory error. Then you will get
> a coredump and can learn something from the backtrace (don't forget
> to compile with -g).

That's a thought, and I will try it. I'm currently (as of yesterday's
crash) running with -d 2 and output sent to a logfile. Is this
debuglevel high enough to tell me which table contains the bad tuple,
if that's indeed the problem?

If I can't nail it down that way, how hard would it be to write a C
program to scan all the tuples in a database looking for bogus size
fields?

-Doug

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2001-01-27 00:43:10 Re: Backends dying due to memory exhaustion--I'm stonkered
Previous Message Peter Eisentraut 2001-01-27 00:34:50 Re: high level specs on PL ?